648 


■  -rri.' .  '  rr-  '  if  nj^l  ^  J-JJ  JJ  VIII,  „  U  1 '  ^UJ",,*  •  i  VJ  1  JOJiiyuuuP^JIlW  ^n^jpps 


ALRAND 
REPORT  50A 


oo 


STATISTICAL  TRAINING  MANUAL 

Volume  II 


10  November  1966 


PREFACE 


This  report  13  based  on  a  series  of  lectures  on  probability  and 
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I.  CONFIDENCE 


A.  Floating  Interval. 

We  concluded  our  previous  course  while  we  were  discussing  how 
the  basics  of  sampling  theory  are  used  to  obtain  information  about  samples 
randomly  drawn  from  a  known  population.  Specifically  we  considered  the 
mean  (x)  or  sum  S(x)  of  the  elements  of  a  sample  and  how  other  such 
means  or  sums  of  other  random  samples  of  the  same  size  from  the  same 
uriverse  might  be  related  to  this  particular  one.  We  considered  the  dis¬ 
tribution  of  this  sample  statistic  and  declared  the  means  to  be  normally 
distributed  when  the  sample  size  was  30  or  more.  So  we  used  a  sample 
in  class  to  learn  more  about  samples  by  way  of  their  means  and  also  by 
way  of  their  standard  deviations. 

The  every-day  practical  problem  requires  us  to  vise  known  samples 
and  infer  conclusions  about  the  unknown  population  from  which  the  sample 
comes;  e.  g. ,  what  is  the  population  mean  when  the  sample  mean  is  known? 
Our  rudiments  of  sampling  theory  will  help  us  to  make  such  a  determination. 
Initially  we  will  consider  the  problem  of  describing  the  population  parame¬ 
ter  from  its  corresponding  sample  statistic  when  the  sample  statistic  is 
the  mean. 

The  oldest  method  of  making  such  an  estimate  was  introduced  by 
LaPlace  in  1814  in  dealing  with  the  problem  of  inferring  the  value  of 
the  probability  of  success  (p)  in  the  binomial  distribution  from  an  observed 
value  of  the  random  variable  x  of  the  distribution.  He  regarded  the 
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size  of  an  interval  which  would  include  p  as  fixed  but  thought  of  p  as 
a  random  variable.  He  was  confident  that  a  certain  percentage  of  the 
time  p  would  be  in  the  interval  and  as  a  result  it  later  became  known  as 
a  confidence  interval.  It  was  not  until  1927  that  the  correct  interpretation 
of  the  interval  as  a  random  or  floating  interval  was  given  by  E.  B.  Wilson. 
Let  us  go  through  such  an  argument. 

First,  recall  a  few  general  facts.  The  sampling  distribution  of 
means  is  a  frequency  distribution  of  the  means  of  all  samples  of  a  particu¬ 
lar  size  each  of  which  is  drawn  randomly  from  the  same  population.  The 
mean  of  the  sampling  distribution  of  means  tends  to  be  very  close  to  the 
population  mean,  although  individual  sample  means  may  vary  quite  a  bit 
from  this  value.  However  their  variability  is  probably  much  smaller 
than  the  variability  of  the  observations  in  the  population.  It  decreases 
with  increases  in  the  sample  size.  For  a  large  size  sample  the  standard 
deviation  of  the  values  in  the  sample  will  not  be  very  different  from  that 
of  the  population.  Finally  for  many  large  size  samples,  the  sampling 
distribution  of  their  means  is  essentially  a  normal  distribution. 

So  we  estimate  the  population  standard  deviation  <rx  by  the  sample 
standard  deviation;  call  it  s.  Then  we  obtain  an  estimated  standard  error 
(deviation)  of  the  mean  by  dividing  s  by  'sTn,  which  we  can  call  s„ 

X 

where  n  is  the  sample  size.  Next,  in  making  a  guess  about  the  popula¬ 
tion  mean,  we  decide  on  what  level  of  confidence  (probability  of  being 
correct)  we  want.  This  determines  for  us  the  confidence  interval  or 
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confidence  limits  v/ithin  which  the  population  mean  should  lie.  It  specifies 
a  range  of  values.  To  increase  the  confidence  level  we  must  make  the 
estimate  less  precise.  On  the  other  hand  we  can  be  more  precise  if  we 
are  willing  vo  take  a  bigger  risk  (less  confidence).  For  example,  suppose 
we  have  sampled  a  universe  and  developed  a  mean  of  100  and  a  standard 
deviation  of  7.  We  desire  a  confidence  level  of  95%,  which  means  we 
expect  the  sample  mean  to  be  within  our  confidence  interval  95  times 
for  100  samples.  Therefore  we  would  expect  to  experience  sample  means 
between  86  and  114  in  all  but  5%  of  the  samples  drawn.  Now  if  we  desire 
to  be  more  precise,  we  narrow  the  confidence  interval.  If  we  establish 
the  confidence  interval  as  93  and  1 07,  we  expect  our  mean  to  be  within 
the  confidence  interval  only  68%  of  the  time.  So  precision  is  sacrificed 
to  high  level  of  confidence  and  vice  versa.  However,  both  precision 
and  confidence  level  can  be  increased  by  increasing  the  sample  size. 

Now  a  very  important  point  has  been  blithely  skipped  over  in  the 
last  paragraph  on  procedure.  Recall  that  we  learned  how  to  calculate 
the  probability  that  went  with  a  certain  distance  from  the  mean  of  a 
normal  distribution  to  a  value  of  the  variable  which  had  that  mean.  How 
do  we  suddenly  slip  over  to  using  a  single  value  of  the  variable  and  dis¬ 
tance  about  it  to  pick  up  the  mean  ?  This  is  the  approach  LaPlace  failed 
to  conceive. 

For  example,  you  will  recall  that  for  a  fairly  normal  distribution 
of  sample  means,  you  are  fairly  sure  that  about  68%  of  all  possible 
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(sample  means  will  be  within  ±1  standard  errors  of  the  mean  of  these 
means  which  in  turn  is  the  population  mean.  About  95%  of  all  possible 
sample  means  will  be  within  ±2  standard  errors  of  the  population  mean. 
Or  we  can  eay  the  probability  of  finding  a  sample  whose  mean  is  more 
than  1  standard  deviation  from  the  population  mean  is  .32.  Also,  we 
can  say  the  probability  of  the  mean  of  the  sample  being  more  than  2 
standard  deviations  from  the  mean  of  the  population  is  only  .  05. 

Consequently  we  can  expect  95%  of  the  time  to  get  a  sample  whose 
mean  is  no  farther  away  from  the  population  mean  than  ±  2  standard 
errors  of  the  sampling  means.  Hence  the  same  size  interval  centered 
on  every  possible  sample  mean  will  pick  up  the  population  mean  about 
95%  of  the  time.  Therefore  when  it  is  placed  on  one  such  sample  mean, 
we  can  be  95%  confident  of  picking  up  the  mean.  Actually  these  last 
remarks  constitute  what  we  mean  by  95%  confidence  and  as  such  are 
definitions. 

The  above  is  so  easy  to  say  symbolically  that  the  needed  concept  of 
the  floating  interval  is  often  iost  to  the  learner.  For  the  situation  as 
pictured  below  in  Figure  1  we  can  say 
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which  can  be  rev'ritten  as 


Pr  { x  -  2 


p  <  x  +  2  }  £  .  95 


nT n  n/~ n 

where  n  is  fairly  large  and  cr—  is  estimated  by  a/^Tn.  This  exemplifies 

I*-  2^r-  *  +  2?r) 


as  an  observable  random  interval  such  that  the  probability  is  .  95  that 
it  contains  p.  It  is  a  95%  confidence  interval  for  p  and  .  95  is  the 
confidence  coefficient. 

We  have  set  up  an  estimator  for  a  parameter  by  using  a  random 
interval  with  a  specified  probability  of  including  the  true  value  of  the 
parameter.  Such  a  device  is  called  an  interval  estimator. 

B .  Floating  Interval  Again. 

To  summarize,  we  realize  in  a  practical  situation  that  we  have  only 
one  sample  and  one  mean.  We  have  seen  how  all  possible  means  behave 
under  chance  variation,  but  we  nave  ru  way  of  knowing  whether  our  single 
sample  mean  is  at  a  point  A  or  B  or  a  point  C,  or  at  any  other  point  along 
the  x  scale  in  Figure  2. 

P£  =  mean  of  all  x1  s 
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We  have  said  we  shall  estimate  the  location  of  ja—  hy  going  out 
a  distance  d  on  either  side  of  x  and  then  claim  p£  is  i-n  this  interval. 

Now  you  know  from  the  theory  of  the  normal  curve  that  if  our  claim  is 
to  be  correct,  the  distance  d  will  depend  upon  the  width  of  the  hump  in 
the  graph  in  Figure  3.  That  is,  d  depends  on  the  size  of  cnj.  If  cr—  is 
small,  the  distance  d  does  not  need  to  be  large  in  order  to  ensure  that 
p—  is  between  x  -  d  and  x  +  d.  If  <r-  is  large,  the  sample  means  are 
more  scattered  and  so  a  larger  d  will  be  necessary  for  an  accurate  esti¬ 
mate  of  Px* 

The  size  of  cr—  measures  the  reliability  of  the  mean,  or  the  extent 
to  which  x  is  expected  to  be  in  error  from  px  simply  by  chance  variation. 
We  have  seen  that  the  mean  of  the  sample  becomes  more  reliable  as  the 
size  of  the  sample  increases,  for  then  o-x  decreases.  So  we  can  always 
rely  on  the  sample  size  to  pump  more  reliability  into  our  estimate  if  time 
and  expense  permit  a  larger  sample. 

On  the  other  hand  even  if  cr-  is  email,  there  can  still  be  sample 
means  as  far  away  from  p^r  as  is  the  point  C  in  Figure  3.  And  although 
we  take  a  d  large  enough  so  that  the  intervals  A  ±  d  and  B  ±  d  include 
p-,  that  is,  so  that  they  give  a  correct  claim  to  the  location  of  px,  the 
same  d  may  not  be  large  enough  to  make  correct  the  claim  that  px  is 
in  the  interval  C  ±  d.  But  remember  that  we  are  willing  to  run  a  speci¬ 
fied  risk  of  making  an  inaccurate  claim. 

Let  us  agree  that  we  need  to  be  only  90%  confident  that  our  claim 
about  p^j  is  true,  that  is,  we  should  expect  only  9  out  of  10  such  claims 


to  be  correct.  This  means  that  we  shall  take  d  large  enough  so  that 
the  claim  will  be  correct  for  90%  of  all  possible  sample  means.  Now 
the  area  table  for  the  normal  curve  tells  you  that  90%  of  all  the  cases  in 
a  normal  distribution  are  no  more  than  1. 6  standard  deviations  from  the 
mean  of  the  distribution.  So  90%  of  all  sample  means  are  within  a  dis¬ 
tance  of  ±  1.  6c of  pjj.  Hence  if  we  make  the  claim  that  p—  is  in  the 
interval  from  x  -  1.  6c r*  to  x  +  1. 6<r^,  we  can  be  90%  certain  that  our 
claim  is  correct.  This  is  true  because  only  10%  of  all  possible  sample 
means  are  like  C,  which  is  farther  away  from  p^  than  1 . 6c as  illustrated 
in  Figure  3. 


A  -  d  A  A  +  d 

L . . .  - .  i 

B  -  d  B  B  +  d 

i — . . . mu . . . . . . i 

C-d  G  C  *1*  d  < 

t==  =  1 1  =z=  zu 

Three  "  Floating"  Intervals 
Figure  3 
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It  is  worthy  of  noting  that  these  confidence  intervals  are  deter¬ 
mined  fully,  with  exact  probabilities,  without  assuming  any  a  priori 
probability  distribution  for  the  parameter.  This  may  seem  paradoxical 
to  you  for  how  can  we  speak  of  the  probability  that  a  parameter  lies  in 
an  interval  when  the  parameter  has  no  probability  distribution.  The 
answer  lies  in  the  fact  that  the  ends  of  the  interval  vary  at  random  in 
repetitions  of  the  experiment,  while  the  parameter  point  remains  fixed. 

It' s  all  in  the  way  you  say  it. 

There  is  one  additional  point  to  emphasize.  In  our  formulas  on 
page  5,  we  use  for  the  standard  deviation  of  the  base  population  the  estimate 
s  from  the  sample.  Justification  for  this  was  discussed  on  page  2, 

Now  from  one  sample  to  another  this  estimate  may  change  a  bit.  Hence 
in  Figure  3  the  lengths  of  the  three  floating  intervals  might  better  be 
shown  to  be  slightly  different  when  o~  is  so  obtained.  However  the 
conclusions  maintain  as  before. 


C.  Project  -  Simulation. 

1.  We  will  select  at  random  10  samples  of  size  16  from  a  popu¬ 
lation  with  known  mean  and  standard  deviation. 

2.  For  each  sample  compute  x  and  sx. 

3.  Compute  an  estimate  of  the  base  population  mean  fj.  at  each 
of  the  confidence  levels  90%,  95%,  and  99%  for  each  sample  above. 

4.  Use  the  true  base  population  standard  deviation  to  compute 
the  same  estimates  as  in  3.  Examine  to  what  extent  this  changes  the 
results  of  3. 
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5.  Note  how  many  of  these  10  estimates  actually  contain  the  true 
base  population  mean  and  compare  this  to  the  number  we  should  expect 
from  the  theory  of  sampling.  Again  compare  the  results  using  the  sample 
standard  deviation  with  those  using  the  true  standard  deviation. 

To  develop  our  solution,  let  us  assume  a  frequency  distribution  as 
shown  in  Table  I.  Then  we  will  create  some  device  such  that  the  numbers 
0  through  10  appear  in  accordance  with  our  assumed  frequency.  That  is,  0 
will  appear  once  and  there  will  be  48  counters  marked  #5.  Numerous  gadgets 
can  be  devised.  Suppose  we  have  a  free  turning  gear  with  ZOO  teeth.  Each 
of  the  teeth  is  marked  with  one  of  our  numbers  in  a  random  manner  so  that 
when  the  gear  is  set  in  motion  and  then  stopped  any  tooth  has  an  equal  chance 
of  stopping  at  our  reference  point.  We  should  expect  to  see  # 5  at  the  refer¬ 
ence  point  48  times  as  often  as  #10.  With  this  device  we  can  proceed  with 
the  simulation.  Another  commonly  used  device  for  random  selection  is 
to  mark  slips  of  paper  with  numbers,  and  put  them  into  a  hat.  The  hat  is 
filled  with  200  slips  of  paper,  in  this  case,  each  marked  with  a  number  in 
accordance  with  Table  X.  The  slips  of  paper  (counters)  are  thoroughly 
mixed  so  that  all  counters  have  equal  opportunity  for  selection.  In  the  class 
we  had  no  mechanical  device,  thus  the  hat  was  used. 


Table  I 


Mark  printed  on  counter 

X 

0 

1 

2 

1 

3  ; 

4 

5 

6 

7 

8 

9 

10 

Number  of  counters 

200  f[x] 

1 

3 

10 

23 

39 

48 

39 

1 

1 

23 

10 

3 

1 

& 
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rhe  mean  of  the  total  counter  population  is  5.  0  and  the  true  standard 
deviation  is  1. 715, 


Select  a  random  sample  of  16  counters  by  replacement.  That  is, 
draw  a  counter,  record  its  number,  and  then  replace  it.  Mi;r  the  counters 
before  each  selection.  (The  cooperative  efforts  of  the  members  of  the 
class  produced  the  following  10  samples.  ) 


Table  II 


Sample 

Number 

0 

1 

2 

3 

4 

5 

6 

B 

8 

9 

10 

X 

sx 

1 

2 

3 

5 

2 

1 

■ 

4.  38 

1. 20 

2 

1 

2 

1 

2 

4 

2 

2 

■ 

3.  94 

2.00 

3 

3 

4 

4 

4 

1 

■ 

4.  75 

1.  24 

4 

1 

1 

2 

1 

1 

7 

1 

2 

■ 

5.  13 

1. 65 

5 

3 

3 

5 

2 

1 

■ 

5.  00 

1.86 

6 

1 

1 

5 

2 

3 

1 

1 

■ 

4.  25 

2. 17 

7 

1 

5 

4 

3 

1 

■ 

5. 13 

1.41 

8 

1 

2 

1 

3 

4 

3 

■ 

5.44 

2.16 

9 

1 

1 

3 

3 

4 

1 

1 

2 

1§ 

4.63 

2.  36 

10 

2 

3 

8 

2 

1 

4.  94 

1. 39 

Before  tabulating  for  each  sample  the  additional  required  information 


let  us  develop  it  clearly  for  sample  1.  If  c  represents  the  degree  of 
confidence  and  sc  the  corresponding  coefficient  or  the  number  of  standard 
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deviations  from  normal  theory,  then  our  three  intervals  are  given  below 
each  followed  by  the  words  "  yes”  or  "  no,  "  depending  on  whether  they 
did  or  did  not  pick  up  the  true  mean. 

c  =  .90,  Zc  =  1.65,  4.38  ±  (1. 65){1.  20)/4 

c<  =  .95,  zc  =  1.96,  4.38  ± (1.  96)(1.  20)/4 

c  =  .99,  zc  =  2.58,  4.38  ±  (2.  58)(1.  20)/4 

(3.88,  4.87)  No 
(3.  79,  4.97)  No 
(3.60,  5.15)  Yes 

A  similar  calculation  for  each  of  the  other  nine  samples  yields 
the  results  in  Table  III. 


Table  HI 
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Note  that  when  the  90%  confidence  interval  contained  the  true  mean, 
we  did  not  bother  to  calculate  an  interval  for  greater  confidence  as  it  would 
automatically  also  contain  it. 

In  summary  we  can  write 


Table  IV 


Confidence  in  Percentage 

Percentage  of  Times  Mean  Included 

90 

80 

95 

80 

99 

100 

Remember  we  are  straining  the  use  of  normal  theory  by  using 
samples  as  small  as  16  in  size.  However  this  strain  was  overcome  by 
our  taking  a  base  population  which  was  fairly  normal  itself.  Also  we 
took  only  10  such  samples  and  couldn' t  possibly  obtain  percentages  of 
times  the  mean  was  picked  up  to  a  finer  difference  than  10%.  Nevertheless 
the  above  project  should  instill  into  you  a  feeling  for  and  a  clear  know¬ 
ledge  of  the  concept  of  a  confidence  interval.  Ideally  we  want  the  per¬ 
centage  and  corresponding  confidence  in  a  row  to  agree. 

D.  The  Binomial  Distribution. 


We  learned  earlier  that  the  binomial  distribution 


where 


11  «  » 

(T*  s  **  • 

x  x!  (n  -  x)! 
n  =  sample  size 
p  =  probability  of  success 
q  =  1  -  p 

is  approximately  normally  distributed  with  mean  np  and  standard  deviation 

\J npq.  As  we  saw  in  the  last  course,  the  probability  of  x  being  within 

* 

a  distance  of  z  '/npq  units  of  np  is  given  approximately  by  2FN(z)  -  1,  i.  e. 

_ _  sjc 

Pr {np  -  z  '/npq  <  x  <  np  +  z  '/npq}  =  2Fjsj(z)  -  1 

For  example,  if  n  *  400,  p  =  .  2,  q  =  .  8,  then  np  =  80,  '/npq  =  8  and 
for  a  probability  of  .  95  (when  z  =  2)  we  know  that  the  interval  (64,  96) 
will  contain  x  about  95%  of  the  time. 

Now  suppose  we  have  a  binomial  distribution  in  which  p  is  not 
known  and  from  n  trials  we  found  x  occurrences.  We  let  za  represent 
the  coefficient  and  a  would  equal  2.  5%  if  we  use  2  standard  deviations  or 
the  95%  confidence  level.  Then  we  can  say 


,  .  x  -  np 

Pr  {-  za  <  ~r=^  < 
v  npq 


+  =  1 


2  a 


where  Pr{z  >  za)  =  <*•  The  two  confidence  limits  for  p  are  such  that 


2LU1E  = 

'/npq 


or  are  the  roots  of 


p2  (n2  +  nz2)  -  p(2nx  +  nz2)  +  x2  =  0. 
Solving  this  quadratic  in  p,  we  find  these  two  values  are 


Note  as  the  sample  size  n  increases  the  above  formula  reduces  to 
(we  must  assume  x  increases  so  that  x/n  doesn*  t  fade  as  it  is  really 
this  proportion  we  obtain  to  estimate  p) 


This  conforms  to  our  normal  theory  which  would  give 

X 

**•  4-  7.  X 

—  *  za  —  • 
n  n 

If  the  population  happened  to  be  fiiiite  of  size  N,  we  would  have  to 
correct  our  standard  deviation  wherever  it  occurs  by  multiplying  it  by 


Consider  the  case  when  N  *  101  an<l  n  =  37.  Now  this  is  sufficient  to 

assume  approximate  normality  (usually  assumed  when  n  £  30  and  N  £  100). 

However  the  correction  factor  becomes  nT" [1 01  -  37)/(101  -  1)  =  0.8.  Hence 

we  must  replace  the  standard  deviation  in  the  above  confidence  interval 

estimate  by  .  8  of  itself.  Only  when  N  is  very  large  compared  with  n  is 

the  factor  nearly  1  and  hence  negligible. 

In  1934  Clopper  and  Pearson  in  Biometrika  constructed  intervals 

of  the  type  just  discussed  for  p  and  presented  graphs  for  95%  and  97.  5% 
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confidence  levels  of  p  for  some  values  of  n  from  10  to  1000.  Instead  of 
x,  they  used  the  sample  estimate  p  =  x/n. 

E.  The  Poisson  Distribution. 

A  discussion,  similar  to  that  just  given  for  the  binomial  distribution, 
can  be  made  for  the  case  when  the  base  population  is  Poisson  distributed. 

W.  E.  Ricker,  following  the  original  lines  of  Clopper  and  Pearson,  pre¬ 
sented  this  in  1937  in  the  Journal  of  the  American  Statistical  Association. 

He  gave  the  formula 

/ 

x  +  1.  92  ±  1.  960  n/TTTTo 

for  the  95%  confidence  limits  of  p  =  X  for  an  observed  value  of  x,  while 
for  99%  confidence  he  gave 

x  +  3.  32  ±  2.  576  Vx+  1.  7  . 

Actually,  Professor  Pearson  suggested  this  to  him  via  the  fact  that  the 
Poisson  distribution  gets  more  and  more  normal  as  the  mean  \  increases 
so  that  the  end-points  of  our  random  interval  for  a  confidence  of  1  -  2a  is 

\z  -  \(2x+  +  x2  =  0. 

Limiting  ourselves  to  large  values  of  x  that  might  occur  in  a  sample,  we 
sometimes  consider  the  result  as  an  estimate  of  the  mean,  hence  also  of 
the  variance  of  the  assumed  base  Poisson  distribution  to  which  it  belonged. 
Then  the  estimating  random  interval  end-points,  say  for  95%  confidence, 
are  taken  to  be 

x  ±  1 . 960  */"x. 

,) 

'•  V 

y 
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By  way  of  comparison  we  have  for  x  =  50  the  estimators  by  the 
essentially  two  different  methods  as  given  in  Table  V. 

Table  V 


Confidence 

95% 

99% 

Lower 

Upper 

Lower 

Upper 

Formulas  Used 

Limit 

Limit 

Limit 

Limit 

Pear  son -Ricker 

37.  9 

65.  9 

34.  8 

71.  8 

Old  Method 

36.  1 

63.  9 

31.8 

68.  2 

F.  Examples  Using  Confidence. 

1.  Problem  1.  Suppose  you  know  a  certain  part  has  its  quarterly 

demands  uniformly  distributed  over  some  interval  of  demand  sizes 

whose  smallest  value  is  zero.  You  wish  to  estimate  the  upper  end-point, 

call  it  a,  of  the  interval.  Now  suppose  you  have  a  sample  of  size  20  and 

its  mean  is  3.  2.  What  are  the  90%  confidence  limits  for  a? 

a.  Solution.  The  distribution  function  can  be  written 

f(x)  =  i_,  0  S  x  S  a. 

a 

Now  its  mean  and  variance  are  easily  computed  to  be 

a 

p  =  Jxidx  =  — 

0  a  2 


2 


/  x?‘  Idx  -  p2 
0  a 
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For  1  -  2a  =  .  90,  we  find  z^  =  1.645.  For  samples  of  size  n,  the  standard 
deviation  of  the  means  of  such  samples  is 


<r_ 

x 


a/N/T2 
nT n 


a 

\Tl  2n 


Hence 


_  a 
x  - 


Pr  <  -1. 645  < 


a  /  1 2n 

So  the  two  values  we  seek  for  bounding  a  are 


<  +1.645 


.9  • 


a 


x 


1 

2 


± 


1.6 

n/  1  2n 


When  n  =  20  and  x  =  3.  2,  this  gives  the  values  5.  3  and  8. 1.  So  our  con¬ 
fidence  interval  estimate  for  a  is  (5.3,  8.1). 

2.  Problem  2.  A  sample  of  100  stock  items  indicated  55%  were  on 
hand.  Find  95%  confidence  limits  for  the  proportion  of  on-hand  items 
in  the  entire  stock. 

a.  Solution.  This  is,  like  Problem  1,  calling  for  a  two- 
sided  interval  estimator.  The  estimate  as  given  by  the  formulation  on 
p^.ge  14  is 


.55  ±  1.  96 


(.55)(.45) 


100 


.55  ±.10 


Therefore  we  can  be  95%  confident  that  the  true  proportion  lies  in  the 
interval  (.  45,  .  65). 

3.  Problem  3.  An  analysis  of  40  randomly  selected  requisition 
cards  revealed  that  24  were  from  the  same  Navy  Supply  Center.  Find 


95%  confidence  limits  for  the  actual  proportion  of  such  cards  to  be  expected 
in  the  long  run  from  this  same  center. 

a.  Solution.  If  we  assume  the  binomial  distribution  with 
n  =  40  and  p  =  24/40  =  .6,  then  the  Clopper -Pear son  tables  give  us  the 
interval  estimator  {.  45,  .  74).  If  instead,  we  assume  the  normal  distri¬ 
bution  and  use  the  approximating  formula,  we  get 

.60  ±  1.96^L6)1l1)L 

V  40 

or  the  confidence  interval  (.  45,  .75). 

4.  Problem  4.  Suppose  our  random  variable  x  is  gamma  distributed 
and  that  from  a  sample  we  find  the  first  decile  (10%  cumulation)  is  1, 33  while 
the  ninth  decile  (90%  cumulation)  is  5.  62.  Find  the  shape  and  scale  of  the 
parameter. 

a.  Solution.  This  is  a  deterministic  problem  in  that  the  two 
empirical  values  given  completely  determine  a  and  p.  To  see  this, 
compute  the  value  of  the  quotient  of  the  10  percentile  value  and  the  90 
percentile  value,  namely  1.  33/5.  62  or  .  236.  Note  in  Table  VI  that  this 
value  of  this  ratio  is  found  in  the  right-hand  column  and  opposite  to  a  =  2.  5. 
Now  in  Table  VII  opposite  to  a  =  2.  5  we  find  x/p  is  1.417  at  the  10th 
percentile  and  is  6.  008  at  the  90th  percentile.  This  overdeterministic 
situation  gives  the  following  two  equations  for  p 

hll  =1.417  and  -5_:-t£  =  6.008 

P  P 

P  =  JL21  =  ,938  3  =  =  .935 

1.417  6.008 

-  '  '  \ 
/  ,  i 
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Since  each  yields  essentially  the  same  value,  .  94,  we  accept  it  and  feel 
some  justification  in  the  assumption  of  the  gamma  distribution.  No  con¬ 
fidence  estimation  was  used  here. 

5.  Problem  5.  We  know  that  a  demand  random  variable  is  normally 
distributed  N(p,  t r  )  and  <r  =  100.  A  sample  of  size  25  is  drawn  and  the 
observed  mean  x  is  250.  Find  95%  confidence  limits  for  the  unknown 
population  mean  |x. 

a.  Solution.  Since  the  sample  size  is  close  to  the  border 
value  of  30  beyond  which  we  usually  assume  the  sample  means  are  normally 
distributed,  we  might  as  well  invoke  the  same  hypothesis.  Then  our 
confidence  interval  becomes 

p  =  250  ±  1-JM12).  =  250  ±  3.  9 
nT 25 

or  (246. 1,  253.  9). 

6.  Problem  6.  From  a  population  of  unknown  parameter  p  repre¬ 
senting  a  proportion  having  an  attribute,  a  sample  of  400  yields  320  with 
this  attribute.  Find  90%  confidence  limits  for  p,  the  true  probability  of 
the  attribute. 

a.  Solution.  Denoting  our  empirical  value  of  p  by  p,  our 
90%  confidence  interval  is  given  by 

p  ±  1 . 


which  for  p  =  .  8  and  n  =  400  becomes  {.  767,  .  833). 


Table  VI 


Ratios  Facilitating  the  Estimation  of  the  Parameters  a ,  (3  of  the  Gamma 
Distribution 


a 

Di  /M 

d5/m 

rD9/M 

Di/D5 

d9/d5 

Di/D9 

-.  5 

Curve  J-Shaped 

.0348 

5.960 

.  0058 

0 

Curve  J-Shaped 

.152 

3.323 

.  0455 

.  5 

* 

2.  366 

6.  252 

.  247 

2.  642 

.  0934 

1.0 

.  532 

1.678 

3.890 

.  317 

2.  318 

.137 

1.  5 

.  537 

1.451 

3.  079 

.  370 

2. 122 

.174 

2.  0 

.551 

1.337 

2.  661 

.412 

1. 990 

.  207 

2.5 

.567 

1. 269 

2.403 

.447 

1.893 

.  236 

3.  0 

.582 

1.  224 

2.  227 

.475 

1.819 

.  261 

3.  5 

.595 

1.192 

2.098 

.  500 

1.760 

.  284 

4.0 

.608 

1.168 

1.999 

.521 

1.711 

.304 

4.5 

.620 

1.149 

1.920 

.539 

1.671 

.323 

5.  0 

.630 

1.134 

1.855 

.556 

1.636 

.340 

5.  5 

.  640 

1.122 

1.801 

.571 

1.606 

.  355 

6.  0 

.649 

1.112 

1.755 

.584 

1 . 579 

.  370 

6.  5 

.657 

1.103 

1.716 

.596 

1.556 

.  383 

7.0 

.  665 

1. 096 

1.682 

.607 

1.535 

.396 

7.5 

.672 

1.089 

1. 651 

.617 

1.516 

.407 

8.  0 

.679 

1.084 

1.624 

.627 

1.499 

.418 

8.5 

.685 

1.079 

1.600 

.635 

1.483 

.428 

9.0 

.691 

1. 074 

1.578 

.  643 

1.469 

.438 

9.5 

.697 

1. 070 

1.559 

.651 

1 . 456 

.  447 

10.  0 

.  702 

1.067 

1.541 

.658 

1.444 

.456 

11.  0 

.712 

1.061 

1.509 

.671 

1.423 

.472 

12.  0 

.  720 

1.056 

1.482 

.682 

1.404 

.486 

13.  0 

.  728 

1.051 

1.458 

.693 

1.387 

.500 

14.  0 

.736 

1. 048 

1.438 

.702 

1.372 

.  512 

15.  C 

.  742 

1. 045 

1.420 

.711 

1.359 

.523 

20.  0 

.  769 

1.033 

1.352 

.744 

1.  309 

.569 

25.  0 

.  789 

1. 027 

1.308 

.768 

1.  274 

.603 

30.  0 

.  804 

1. 022 

1.  277 

.786 

1 .  249 

.629 

*Mode  to  left  of  Di.  Where:  Di  =  ith  decile;  M  ”  mode. 
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F 


.10 


75 


.90 


95 


-.5 
0 
.  5 
1. 0 

1.5 

2.0 

2.5 
3.0 

3.5 

4.  0 

4.5 

5.  0 

5.5 

6.  0 

6.  5 

7.  0 

7.5 

8.  0 

!  8.5 

|  9.0 

!  9. 5 
|  10.  0 
11.  0 
12.  0 

13.  0 

14.  0 

15.  0 
20.  0 
25.  0 
30.  0 


.  05 


. 00197 
.  0513 
.176 
.  355 
.  573 

.  818. 
1.084 

1.  366 
1.663 
1. 970 

2.  287 
2.613 
2.946 

3.  285 
3.  630 

3.  981 

4.  336 
4.695 

5.  058 
5.425 

5.  796 

6.  169 

6.  924 

7.  690 
8.464 

9.  246 
10.  035 
14.  072 
18.  218 
22.  444 


. 00790 
.  105 
.  29  2 
.  532 
.  805 

1.102 
1.417 
1. 745 
2.  084 
2.433 

2.  789 

3.  152 
3.521 

3.  895 

4.  273 

4.  656 

5.  043 
5.43  2 
5.825 

6.  221 

6.  620 
7.  021 

7.  829 

8.  646 

9.  470 

10.  300 
11.135 
15.  382 
19.  717 
24. 113 


.  25 

.  50 

.  0508 

.  227 

.  288 

.  693 

.  606 

1.183 

.  961 

1. 678 

1.  337 

2. 176 

1. 727 

2.  674 

2.  127 

3.  173 

2.535 

3.672 

2.  949 

4.  171 

3.  369 

4.671 

3.  792 

5. 170 

4.  219 

5.  670 

4.  650 

6. 170 

5.  083 

6.  670 

5.  518 

7.  169 

5.  956 

7.  669 

6.  396 

8.  169 

6.  838 

8.  669 

7.  281 

9. 169 

7.  726 

9.669 

8.  172 

10. 169 

8.  620 

10.  668 

9.  519 

11.668 

10.  422 

12.  668 

11.  329 

13.  668 

12.  239 

14.  668 

13.  152 

15.  668 

17.  755 

20.  668 

22.404 

25.667 

27.  085 

30.  667 

.662 
1. 386 

2.  054 
2.693 

3.  313 

3.920 

4.  519 

5.  109 

5.  694 

6.  274 

6.850 
7.423 
7.992 
8.558 
9.  123 

9.  684 
10.  244 
10.  802 
11. 359 
11. 914 

12.467 

13.  020 

14.  121 

15.  217 

16.  310 

17.400 
18.  487 
23.  883 
29. 234 
34.  552 


1. 353 

2.  303 

3.  126 
3.  890 
4.618 

5.  322 

6.  008 
6.  681 
7.  342 
7.994 


14.808 

15.407 

16.  598 

17.  782 

18.  958 

20. 128 
21. 293 
27.  045 
32.711 
38.  33  5 


1. 921 
2.996 

3.  907 

4.  744 

5.  535 

6.  296 

7.  034 

7.  754 

8.  460 

9.  154 


16.  335 
16.  962 
18.  208 
19.443 
20.  669 

21. 886 
23. 098 
29.  062 
34.  916 
40.  691 


8.638 
9.  275 
9.  906 
10.  532 
11.154 

11. 771 
12.  384 
1  2.  995 
13.602 
14. 206 


9.  838 
10.  513 
11.181 
11.842 

12.  498 

13.  148 
13.  794 
14. 435 
15.  072 
15.  705 


Sometimes  we  want  to  compare  the  means  of  two  samples.  Really 
we  should  say  we  are  interested  in  how  great  may  be  the  difference  between 
the  means  of  their  base  populations.  Denote  these  two  means  by  (ij  and 
|i2  and  also  denote  the  sample  size  by  nj  and  n2 ,  respectively,  with 
means  xj  and  x2 ,  respectively,  tf  the  samples  are  lerge,  then  we  learned 
in  the  previous  course  what  the  variance  for  Xj  -  x2  is  in  either  a  finite 
population  or  an  indefinitely  large  population  in  terms  of  the  variance  of 
each  population.  At  that  time  we  also  remarked  that  the  difference  xj  -  x2 
is  essentially  normally  distributed.  Kence 


is  N(0,  1),  or 


Lz.  <  *L -Sa^L.fel  <  +  sc  \  i  C. 
I  °Xi  -x2  J 


The  above  symbolism  is  a  slight  break  with  the  convention  of  writing  a 
for  c  when  it  i3  the  subscript  and  of  writing  1  -  2a  for  c  on  the  right 
side  of  the  last  expression.  It  seems  more  natural  to  write  simply  what 
we  just  did  and  realize  that  for,  say  c  =  .  90,  you  must  pick  zc  so  that 
F(zc)  =  .95. 

From  our  discussion  in  the  previous  course  you  will  recall  we  use 


for 


cr—  _ 
xx  -x2 


a2  <r2 

for  indefinitely  large  populations  and 

ni  n2 
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4,  IN,  -n,L  4,  (N, 


„  I  u  /  N-  -  i 

— ~  |+  — ^2.  - —  j  for  finite  populations. 

n2  ^ 


Rewriting  our  previous  probability  statement  we  obtain  the  confidence 
interval  estimate  for  pj  -  p2, 

Pr{xj  -x2  -  zc^xi-xz  <  M-i  -  ^2  <  *1  *  +  Zc<r~  }  =  c. 

2  2 

When  the  0^  and  ff  are  not  known,  we  may  replace  them  by  their 
2  2 

sample  estimates,  s*  and  s2  ,  respectively. 

1.  Illustration.  For  a  particular  Federal  Stock  Number  (FSN)  we 
find  that  out  of  580  orders  in  one  year,  the  mean  demand  (average  requisition 
size)  is  34.  4  units  per  order  and  the  standard  deviation  is  8.  83  while  in 
the  succeeding  year  from  786  orders  the  mean  demand  is  28.  02  and  the 
standard  deviation  is  8,  81.  What  are  the  95%  confidence  limits  for  the 
difference  of  the  means  of  the  two  conceptually  different  populations? 

a.  Solution.  For  c  =  .  95  we  have  zc  =  1.96.  Therefore 
the  limits  sought  are 


(34.  45  -  28.02}  ±  1.96' 


8.  83)2  4.  (8.  81 ; 
580  786 


6.  43  ±  1.  96  X  .  8*± 
or  6.  43  ±  .  95 

So  we  are  95%  certain  that  if  there  are  two  different  patterns  of  behavior 
for  each  year,  the  means  differ  by  no  less  than  5.  48  and  by  no  more 
than  7.  38. 

If  we  are  interested  in  confidence  limits  for  the  difference  of  two 


population  proportions,  pi  and  p2,  then  really  this  is  simply  another 


•A- 


v»* 


application  of  the  general  theory  we  just  learned.  The  excuse  for  remarking 
on  it  lies  in  the  simplicity  of  the  resulting  expression.  Suppose  fh  and  $2 
are  the  sample  estimates  from  samples  of  size  n*  and  n2>  respectively. 


Then 

(Pi  -  P2  )  -  (Pi  -  P2  ) 

°pi  -P2 

is  N(0,  1)  and  so 

pr[.zc  <  Pt.--PL-:JP-L-.:._E2.2  <  +  zc 

i  °Pi  -Pz 

Now  using  the  sample  estimates  for  our  required  variances  we  have 


or 


Pi  “Pz 


Pi  (1  ~  Pi )  +  Pz  (V  -  h) 

ni  n2 

for  indefinitely  large  populations 


Pi  (Lilli  fNi  -  nil  Ml  zjj)  /N2  -  n2  \ 

~  nx  1nx  -  1  r  n2  \  N2  -  1  / 

for  large  finite  populations. 

Rewriting  our  previous  probability  statement  we  obtain  the  confidence 
interval  estimate  for  -  Pz , 

h  -  %  -  zc<rpi  -p2  <  Pi  -  P2  <  Pi  -  P2  +  zccrpi  -p2  )  -  c- 
2.  Illustration.  One  FSN  was  ordered  in  230  days  out  of  400  days 
while  another  was  requested  200  out  of  500  days..  Find  95%  confidence 
intervals  for  the  difference  between  the  conceptual  rates  of  demand. 


>j  'J 
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a.  Solution.  Assume  indefinitely  large  populations.  For 
c  =  .  95  we  have  zc  =  1.96.  Therefore  the  limits  sought  are 

230  ’00'  ±  ,  -.,-1  /(230/4001(1  -  230/400)  7  (200/500X1  -200/500) 

400  '  500l  ’  V  - 400 -  - 500 - 


or 


(.575  -  .400)  ± 
.  175  ±  . 065 


i.r?J  (-575?L425)  +  mooh-A°o) 


400 


500 


So  for  all  practical  purposes  we  might  say  the  difference  between  the 
mean  demand  rates  lies  between  .110  and  .  240. 
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II.  SMALL  SAMPLE  THEORY 
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A.  Some  History  of  Research  and  Development. 


In  1908  in  the  paper  entitled  "  The  Probable  Error  of  a  Mean"  appear¬ 
ing  in  Biometrika,  W.  S.  Gossett,  alias  "  Student, "  wrote 


"Any  experiment  may  be  regarded  as  forming  an  individual 
of  a  "  population"  of  experiments  which  might  be  performed  under 
the  same  conditions.  A  series  of  experiments  is  a  sample  drawn 
from  this  population. 

"Now  any  series  of  experiments  is  only  of  value  in  so  far 
as  it  enables  us  to  form  a  judgment  as  to  the  statistical  constants 
of  the  population  to  which  the  experiments  belong.  In  a  greater 
number  of  cases  the  question  finally  turns  on  the  value  of  a  mean, 
either  directly,  or  as  the  mean  difference  between  the  two 
quantities. 

"If  the  number  cf  experiments  be  very  large,  we  may  have 
precise  information  as  to  the  value  of  the  mean,  but  if  our  sample 
be  small,  we  have  two  sources  of  uncertainty:  (1)  owing  to  the 
"error  of  random  sampling"  the  mean  of  our  series  of  experi¬ 
ments  deviates  more  or  less  widely  from  the  mean  of  the  popu¬ 
lation,  and  (2)  the  sample  is  not  sufficiently  large  to  determine 
what  is  the  law  of  distribution  of  individuals.  It  is  usual,  however, 
to  assume  a  normal  distribution,  because,  in  a  very  large  num¬ 
ber  of  cases,  this  gives  an  approximation  so  close  that  a  small 
sample  will  give  no  real  information  as  to  the  manner  in  which 
the  population  deviates  from  normality:  since  some  law  of  dis¬ 
tribution  must  be  assumed  it  is  better  to  work  with  a  curve  whose 
area  and  ordinates  are  tabled,  and  whose  properties  are  well 
known.  This  assumption  is  accordingly  made  in  the  present 
paper,  so  that  its  conclusions  are  not  strictly  applicable  to  popu¬ 
lations  known  not  to  be  normally  distributed;  yet  it  appears 
probable  that  the  deviation  from  normality  must  be  very  extreme 
to  lead  to  serious  error.  We  are  concerned  here  solely  with  the 
first  of  these  two  sources  of  uncertainty. 

"  The  usual  method  of  determining  the  probability  that  the 
mean  of  the  population  lies  within  a  given  distance  of  the  mean  of 
the  sample  is  to  assume  a  normal  distribution  about  the  mean 
of  the  sample  with  a  standard  deviation  equal  to  s/\^ n,  where  s 
is  the  standard  deviation  of  the  sample,  and  to  use  the  tables  of 
the  probability  integral. 

"  But  as  we  decrease  the  number  of  experiments,  the 
value  of  the  standard  deviation  found  from  the  sample  of 
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experiments  becomes  itself  subject  to  an  increasing  error, 
until  judgments  reached  in  this  way  become  altogether 
misleading.  " 

A  few  paragraphs  later,  Mr.  Gossett  goes  on  to  say 

"  Again,  although  it  is  well  known  that  the  method  of  using 
the  normal  curve  is  only  trustworthy  when  the  sample  is 
"large,  "  no  one  has  yet  told  us  very  clearly  where  the  limit 
between  "large"  and  "  small"  is  to  be  drawn. 

"  The  aim  of  the  present  paper  is  to  determine  the  point 
at  which  we  may  use  the  tables  of  the  probability  integral  in 
judging  of  the  significance  of  the  mean  of  a  series  of  experi¬ 
ments,  and  to  furnish  alternative  tables  for  use  when  the  num¬ 
ber  of  experiments  is  too  few.  " 

The  reader  must  be  wondering  by  now  why  we  classify  this  concern 
under  the  heading  "  Small  Sample  Theory.  "  Actually  it*  s  not  the  size  of 
the  sample  that  is  the  basic  concern- -it  is  the  estimating  of  the  base  popu¬ 
lation  standard  deviation  from  the  sample  and  this  estimate  goes  to  the 
lean  side  when  the  sample  size  is  small.  However,  if  we  know  the  standard 


deviation  of  our  base  population  for  which  we  are  attempting  to  ascertain 
the  mean  and  if  the  base  population  is  essentially  normal,  then  the  means 
of  samples  of  any  size  are  normally  distributed  and  we  use  the  zc  for  our 
confidence  c  on  the  base  population  standard  deviation  divided  by  nT n. 

The  problem  arises,  as  Gossett  said,  when  the  base  population  is 
unknown,  even  though  assumed  normal,  because  then  we  cannot  use  the  zc 
confidence  limits  since  <r  is  not  known.  That  is,  we  don' t  know  when  we 
can  use  it,  supposing  there  are  such  times,  and  further,  when  we  can' t 
use  it,  we  need  to  know  how  to  modify  zc  to  get  confidence  c. 

The  sum  and  substance  of  the  mathematical  problem  is  to  find  for 
a  sample  (xi ,  x2,  ••*,  xn)  of  size  n  from  a  population  N(p,  c r2 )  the 
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theoretical  distribution  of  the  random  variable 

t  = 

s/v  n 

where  s  is  the  standard  deviation  of  the  sample  set  of  numbers. 

Fortunately  it  can  be  proven  that  this  distribution  function  does  not 
involve  c r,  the  population  standard  deviation.  Mr.  Gossett  first  obtained 
the  distribution  of  s2  in  random  samples  after  having  examined  many 
empirical  situations.  He  did  this  by  using  the  relation  connecting  the  first 
four  moments  of  the  Pearson  Type  III  curve 

y  =  A(x  -  |JL)X"1e‘a(x“H-),  x  >  ji,  «  >  0,  X  >  0, 
which  generalizes  the  gamma  distribution  and  hence  also  the  chi-square 
distribution  to  be  studied  later.  Knowledge  of  the  first  four  moments  of 
any  frequency  function  belonging  to  Pearson' s  system  is  sufficient  to  deter¬ 
mine  that  function. 

Tediously,  as  Gossett  puts  it,  he  obtained  the  moments  (Mi)  of  s2 
about,  its  mean  (since  he  used  the  bias  formula  he  calculated  the  mean  of 
s2  to  be  (J.2,'(n  -  l)/n)  to  be,  in  order, 

0,  2p.2  (n  -  l)/n2,  8p,2  (n  -  l)/n3,  1  2jxz4  (n  -  l)(n  +  3)/n\ 

so  that 

pi  =  8/(n -  1),  p2  =  3(n  +  3)/(n  -  1), 
where  Pi  and  p2  are  "  shape  predictors"  not  sensitive  to  magnitude  of 
the  data.  These  values  satisfy  the  Pearson  criterion 


2f32  -  3p!  -  6  =  0 

for  a  Type  III  curve.  Consequently  Gossett  said  he  believed  that 
followed  the  law 


P  -Yx 
y  =  cxe 


where 


=  zMl  =  4Pl  (n  -  1  )n  _  n 

M3  8np.i(n  -  1)  2Pz 


P-1.1- 

Pi 


-  n  -  1  i  _  n  -  3 


Consequently  he  got 


n-3  nx 


f  o2  (x)  =  cx  e 


'  2P2 


The  distribution  of  s  may  be  found  since  the  frequency  of 
of  s2  and  all  we  must  do  is  to  compress  the  base  line  suitably, 
reasoned 

yi  =  4><s2 ) 

Yz  ~  4j(s) 

Then  y1d(s2)  =  y2d(s), 

Yz  ~  2syj 

n-3  ns2 


Yz  =  2cs(s2)  e 


2Pz 


=  2c  s 


_  ns 
n“2e  2Pz 


or 


nx“ 


fs(x)  =  Axn"2e  2^2 


s  is  that 
Gossett 
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Next  he  derived  the  distribution  of  z  s  x/s^  the  distance  of  the  mean  of 


sample  measurements  in  terms  of  s,  for  which  he  got 


f  ...  1.1  (1  ♦  *2fn/2  ,  n  odd 

2  n  -  3  n  -  5  42 


y(z)  =  < 


1  n  -  2  n  -  4 


...  1.1  (1  +z‘) 
it  n  -  3  n  -  5  31 


-n/  2 


,  a  even 


or 


y  = 


(1  +  z2) 


-n/2 


which  has  the  following  descriptive  values: 

<r  =  1  /\T n 3,  Pi  =0,  p4  =  3/(n  -  3)(n  -  5),  p2  =  3  +  6/{n  -  5). 
And  it  is  symmetric  about  zero  so  if  we  wanted  to  fit  a  normal  to  it,  we 
would  use  the  given  formula  for  <r,  (<r  =  1  /*J  n  -  3).  Remember,  r(n+  1)  = 
nr(n)  was  generalized  from  the  case  when  n  is  a  positive  integer  and  then 

r(u+  1)  =  n!. 

Now  Gossett*  s  original  papers  suffered  from  two  defects: 

1.  As  n  increases  the  z-scale  becomes  very  close. 

2.  Except  in  the  case  for  which  it  was  designed,  n,  the  number  in 
the  sample,  is  not  the  best  number  under  which  to  enter  the  table,  but 

n  -  1,  the  number  of  degrees  of  freedom,  is  . 

So  at  Fisher' s  suggestion  new  tables  were  constructed  with  argument 
t  =  zv: n*  where  n*  is  now  one  less  than  the  number  in  the  sample,  which 
Gossett  temporarily  called  n*  .  So  if  we  switch  from  z  to  the  more  familiar 
t,  then  we  could  say  that  the  new  variable  and  old  variable  are  related  by 

f'A/ 
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t4  =  (n  -  l)z4 
dt  =  \/  n  -  1  dz 

Moreover  we  get  to  see  again,  how  when  we  stretch  (or  compress)  units 
horizontally,  we  must  do  just  the  opposite  vertically  to  preserve  area.  In 
this  case  the  distribution  of  t  is  found  from  that  of  z  since  the  frequency 
of  t  is  equal  to  that  of  z  so  that  all  we  have  to  do  is  expand  the  base  line 
suitably.  So  we  find  written  in  many  books 


which  may  not  be  as  appealing  to  some  people  as  the  original  form  of 


Mr.  Gossett. 


The  new  descriptive  values  are 


or 


P-4 


3(n  -  1 )z / (n  -  3)(n  -  5), 


P2  =  3  +  6/(n  -  5). 

The  parameter  n  -  1  is  called  the  degrees  of  freedom.  For  small  n 
this  t- distribution  differs  considerably  from  the  unit  normal  distributions 
which  it  approaches  as  n  increases  without  limit.  In  Figure  4  the  graph 
for  n  =  4  is  compared  with  that  of  the  limiting  normal.  Here  it  can  be 
seen  that  the  probability  of  a  large  deviation  from  the  mean  is  much  larger 
in  the  t-distribution  than  in  the  normal  case. 
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Figure  4 


To  see  that  the  normal  distribution  is  the  limiting  distribution  we 
write  fn(t)  as 


§  +  l)  nT2 

nim  Vn 


•n;  i1  +  n 


n+1 
2  \"  2 


The  factor  in  brackets  can  be  shown  to  approach  unity  and  for  every  fixed  t. 


n+1 


*  1  .  .  .  t‘\ 

T~  lo§  1  +  - 
a  n 


i\  _  _£ 
2 


Hence 


^  e'tV2- 

All  of  the  above  remarks  can  be  gleaned  from  Table  VIII  of  values 
that  gives  tc  for  confidence  c.  Consistent  with  our  notation  heretofore 
let  us  write 

£t.n-l(tl 

for  "  the  probability  density  function  of  the  random  variable  known  as 
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Table  VIII.  Values  of  tc  for  Centered  Confidence  Interval 


Sample  Size 
n 

Degrees  of 

F  reedom 
n  -  1 

c 

.  99 

.  95 

.90 

2 

1 

63. 657 

12.706 

6.  314 

3 

2 

9.925 

4.  303 

2.  920 

4 

3 

5.  841 

3.182 

2.  353 

5 

4 

4.604 

2.  776 

2.  132 

6 

5 

4.  032 

2.571 

2.  015 

7 

6 

3.  707 

2.  447 

1. 943 

8 

7 

3.499 

2.  365 

1. 895 

9 

8 

3.  355 

2.  306 

1. 860 

10 

9 

3.  250 

2.  262 

1.833 

11 

i  0 

3.169 

2.  228 

1. 812 

12 

11 

3.106 

2.  201 

1. 796 

13 

12 

3.  055 

2.  179 

1. 782 

14 

13 

3.  012 

2.160 

1.  771 

15 

14 

2.977 

2.145 

1.761 

16 

15 

2.  947 

2.131 

1.  753 

17 

16 

2.  921 

2.120 

1.  746 

18 

17 

2.898 

2.110 

1.  740 

19 

18 

2.  878 

2.  101 

1. 734 

20 

19 

2.  861 

2.  093 

1. 729 

21 

20 

2.845 

2.  086 

1. 725 

22 

21 

2.831 

2.  080 

1. 721 

23 

22 

2.819 

2.  074 

1. 717 

24 

23 

2.807 

2.  069 

1. 714 

25 

24 

2.797 

2.  064 

1. 711 

26 

25 

2.787 

2.  060 

1. 708 

27 

26 

2.  779 

2.  056 

1. 706 

28 

27 

•  2.771 

2.  052 

1.703 

29 

28 

2.763 

2.  048 

1. 701 

30 

29 

2.  756 

2.  045 

1. 699 

31 

30 

2.750 

2.  042 

1. 697 

41 

40 

2.705 

2.021 

1. 684 

61 

60 

2.660 

2.  000 

1. 671 

121 

120 

2.617 

1. 980 

1. 658 

00  — zc  — 

00 

2.  576 

1.960 

1. 645 

\ 
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Student' s  t  -  when  the  sample  size  is  n  or  the  number  of  degrees  of 
freedom  is  n  -  1 .  " 

It  is  apparent  from  Table  VIII  that  s  underestimates  <r  on  the  average 
for  a  fixed  n.  For  any  given  confidence  as  n  decreases  to  zero,  the  con¬ 
fidence  coefficient  t  increases.  On  the  other  hand  for  large  n  we  see  that 
tc  is  practically  zc  and  that  in  the  limit  this  equality  exists.  Also  it  is 
to  be  noted  that  tc  settles  down  faster  for  larger  values  of  c.  For  example, 
when  n  goes  from  10  to  11,  tc  changes  by  only  .  02  for  c  =  .  90  while  it 
changes  by  .  1  for  c  =  .  99. 

Gossett,  in  concluding  remarks,  expressed  belief  that  if  the  base 
population  distribution  is  not  normal  and  if,  consequently,  the  mean  and 
standard  deviation  of  a  sample  have  greater  variability,  still  they  will  tend 
to  counteract  each  other,  a  mean  deviating  more  from  the  general  mean 
tending  to  be  divided  by  a  larger  standard  deviation.  Experience  in  sub¬ 
sequent  years  showed  him  correct  for  small  samples  of  size  less  than  30 
from  populations  sufficiently  nearly  normal. 

B.  Using  the  t- distribution. 

So,  if  we  want  to  estimate  the  mean  p  of  a  base  population  by  using 
a  sample  of  size  n  whose  mean  is  x  and  whose  standard  deviation  is  s, 
we  simply  decide  on  the  desired  confidence  c,  then  look  up  tc  for  n  -  i 
degrees  of  freedom.  It  follows  that  we  can  say 


Pr- 


tc  < 


;  /  nT n 


i±-  <  +  t,  >  = 


'iV 
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or 


Pr  \  x  -  tc  <  pi  <  x  +  tc 

!  N  n  V  n 


Once  again  we  have  a  random  interval 


(x  -  tc  - 

\  nTu 


,  X  +  tc 


■v/'J 


which  100c%  of  the  time  should  include  p.. 

1.  Illustration.  A  set  of  11  requisitions  for  a  particular  stock 
number  has  a  mean  x  =  4  (average  requisition  size)  and  a  standard  deviation 
s  =  .  6.  What  are  the  95%  confidence  limits  for  the  true  mean  (p.)  or  the 
average  requisition  size  for  all  requisitions  for  this  stock  number? 

a.  Argument.  Since  n  -  1  =  10  =  degrees  of  freedom  and 
c  =  .  95,  we  find  from  Table  VIII  that  tc  =  2.  228.  Therefore  the  95%  con¬ 
fidence  limits  for  p.  are 


4  ±  2.  228 *  4  ±  .4 

WTi 


or  the  estimating  interval  is  (3.  6,  4.4). 

Suppose  xi  and  x2  are  the  means  of  two  samples  {xu}  ami  {x2j  } 
of  sizes  nt  and  n2,  respectively,  from  the  same  base  population.  Then 


xt  -  x2  -  (pi  -  p2 )  _  xi  -  x2  -  (p:  -  p2 ) 


cr  +  cr 
ni  n2 


rj2L±JH 

V  nin2 


will  be  normally  distributed  N(0, 1).  Now  we  must  estimate  cr  from  our 
sample  data.  You  may  recall  that  when  we  pooled  two  sets  of  data  that 


had  the  same  mean  we  found  the  pooled  variance  to  be  the  weighted 
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arithmetic  mean  of  the  two  variances,  namely 


s2  =  [  (n!  -  Usi"  +  (n2  -  l)s22]/(ni  +  n2  -  2). 


Consistent  with  this,  if  we  let 


nx  n2 

Si  =  £  (Xi  i  -  Xi )  ,  S2  =  y  <x2  i  -  X2 )  , 

i=l  j=l 


we  estimate  c r  by 


s  —  • 

V  nj  +  n2  -  2 

When  this  sample  standard  deviation  is  used  in  place  of  c r,  then 

xi  -  x2  -  (in  -  p2 ) 

.  V-  +  - 

V  ni  n2 

is  distributed  as  "  Student’ s  t”  with  nx  +  n2  -  2  degrees  of  freedom. 


Therefore  we  can  say 


*  <  xi  -  x2  -  (pi  -  p2  ) 

/ . .  .  '  lr 


—  +  — 
ni  n2 


or  the  100c%  confidence  interval  estimate  of  pj  -  p2  is 


XX  -  x2  -  tc s  V—  +  —  ,  Xi  -  x2  +  tc s-y  --  +  -i- 
l  v  m  n2  v  ni  n2 


2.  Illustration.  Suppose  two  sets  of  quarterly  demand  observations  are 
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25 


Estimate  the  difference  |Xi  -  p2  on  the  assumption  they  each  came  from  a 
different  base  population. 

a.  Argument.  First  we,  calculate 


ni  =  8 

n2 

=  9 

xi  =  Dx  =17.  62 

x2 

=  D2  =  23.33 

cn 

*—• 

II 

u> 

CO 

CO 

s2 

=  184.  00 

Sx  +  S2  =  221.  88 
xt  -  x2  =  5.71- 

Then  the  estimated  variance  of  the  difference  between  the  means  is  given  by 

s2  (ni  +  n2  )  .  (St  +  S2)(m  +  n2  )  -  (221 . 88)(1 7) 

ni  n2  (nx  +  n2  -  2)(na  n2 )  (15)(9)(8) 

=  3.  50 

and  the  estimated  standard  deviation  is  1.  87,  Hence  for  degrees  of  freedom 
(nx  +  n2  -  2}  =  15,  we  have  for,  say  95%  confidence,  the  interval  estimate 
for  p.x  -  fi2  of 

5.71  ±  2.  131(1.87)  =  5.71  ±  3.98 


(1.73,  9.69). 
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Another  conclusion  of  a  confidence  type  can  be  drawn.  We  are  95% 
confident  that  the  means  of  the  two  assumed  base  populations  are  different 
since  the  lower  end-point  of  our  estimating  interval  is  positive.  If  the 
value  zero  had  been  included  in  our  confidence  interval,  then  equality  of 
the  means  would  not  be  rarer  than  5%  of  the  time  due  to  chance. 

3.  Illustration.  Here  are  two  sets  of  demands  that  certainly  appear 
to  be  alike.  It  must  be  remembered  that  we  test  by  virtue  of  the  quotient 
of  our  sample  means'  difference  and  a  pooled  estimate  of  standard  deviation. 


Dj. 

Bl 

79.  98 

80.  02 

80.  04 

79.  94 

80.02 

79.98 

80.  04 

79.  97 

80.  03 

79.  97 

80.  03 

80.  03 

80.  04 

79.95 

79.97 

79.  97 

80.  05 

80.  03 

80.  02 

80.  00 

80.  02 

a.  Argument.  We  find  Dj,  =  80.02,  D2  =  79.98,  nj  =13, 
n2  =  8,  S]2  =  .  000574,  s2  =  .  000984.  Therefore  our  estimate  of  c r  is 
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V 


.  -V'^  «°°574)  +  7(.  000984)  =  ^7^0725  =  .  0269 

Y  19 

Now  for  c  =  .  95  and  degrees  of  freedom  =  19,  we  find  from  Table  VIII  that 
tc  =  2.  093.  So  our  error  term  becomes 

^ *  2-093  x  •  0269  x  VHtI  =  -025 

and  we  find  that  with  95%  confidence  these  two  sets  of  demands  come  from 
separate  populations  the  difference  of  whose  means  is  .  04  ^  .  025.  Or  we 
can  say  we  are  95%  confident  the  true  difference  of  the  base  population 
means  lies  in  the  interval  (.  015,  .  065). 

Again  we  can  also  be  confident  that  the  two  base  populations  have 
different  means. 


C.  Chi-square. 


Of  interest  is  the  sample  random  variable 
{.^1  -  x)2  +  {x2  -  x)2  +  •  ••  +  (xn  -  x)Z 


which  can  be  written  equivalently  as 
(n  -  l)s2 


V/fcen  the  samples  of  size  n  are  drawn  from  a  normal  distribution  with 
variance  cr2 ,  this  new  random  variable  has  its  density  function  given  by 

r  '  ,  I2?1)-1  (n-Ps2 

[(n  -  1  )aZ  I  ‘  “  ( 2trz ) 


1 


l“li|  |n  -  1) 

2  2  -ri— 1 
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Kail  Pearson  may  have  used  an  awkward  symbol  to  replace  this  variable, 
but  he  wanted  to  11  characterize  a  sum  of  squares. "  So  he  picked  the  Greek 
for  11  ch"  which  is  x  and  then  put  a  2  on  it  in  exponential  position.  He 
called  the  symbol  x2,  "chi-square,"  and  wrote  the  above  probability 
density  as 

.  £-3  x* 

£(*2)  =  nr^r:  <x*> 2 2 

2  2  r|— ) 

We  say  this  is  "x  with  n  -  1  degrees  of  freedom"  and  that 


I (n  -  1 )s2  ^  (n  -  l)s2 
'  XUC  XLC 

is  a  100c%  confidence  interval  for  <r2  while 


•in  -  1  s 

V  n  -  1  s 

Ws  ' 

'v  UC 

/ 

is  a  100c%  confidence  interval  for  a .  In  keeping  with  our  earlier  method 
of  notation  we  will  write 

W,n-l(X2)  f0r  f(x*>  • 

Looking  back  to  page  90  of  ALRAND  Report  50,  Volume  I,  we  see 
this  is  simply  T  with  X  =  1/2  and  k  =  (n  -  l)/2. 

1.  Illustration.  The  standard  deviation  of  a  random  sample  of  16 
requisitions  for  an  item  is  9.  6/n/T5  based  on  units  per  requisition. 

Assuming  the  requisition  size  in  units  per  requisition  is  normally  distributed, 
find  95%  confidence  limits  for  the  standard  deviation  of  the  historical 
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requisition  size  for  this  item  and  also  for  the  size  of  requisitions  to  be 


experienced  in  the  future. 

a.  Argument.  The  degrees  of  freedom  are  n  -  1  ='  15.  For 
c  =  .  95,  we  find  in  Table  IX 

=  27.5,  X"  e  =  6.26. 


,975 


025 


Hence 


and  our  confidence  interval  estimate  is 


Vx% ~  *  5-  2*.  =  2. 50 


.025 


\/15  v  15 


\  5.  24 


2.50  l 


or 


(1.83,  3.84),  roughly  (2,  4). 

A  few  additional  remarks  arc  in  order  about  this  distribution  function. 


The  graphs  of  a  few  look  like  Figure  5. 


Figure  5 


Table  IX.  Values  of  the  Cumulative  y2  Distribution 


Sample 

Size 

Degrees  of 
Freedom 

Cumulative  Probability 

n 

n  -  1 

.  010 

.  025 

.  05 

.10 

.90 

.95 

.  975 

.990 

2 

1 

,  0002 

.  001 

.  0039 

.  0158 

2.71 

3.84 

6.  63 

3 

2 

.  0201 

.  0506 

.103 

.  211 

4.61 

5.99 

7.  38 

9.  21 

4 

3 

.  115 

.  216 

.  352 

.  584 

6.  25 

7.81 

9.35 

11.  34 

5 

4 

.  297 

.484 

.  711 

1. 064 

7.78 

9.49 

11. 14 

13.  28 

6 

5 

.  554 

.831 

1.145 

1.61 

9.  24 

11. 07 

12.83 

15.09 

7 

6 

.  872 

1.  24 

1. 64 

2.  20 

10.  64 

12.59 

14.  45 

16.  81 

8 

7 

1.  24 

1.69 

2.  17 

2.83 

12.  02 

14.  07 

18.48 

9 

8 

1. 65 

2. 18 

2.  73 

3.49 

13.36 

15.  51 

17.  53 

20.09 

10 

9 

2.  09 

2.  70 

3.  33 

4.  17 

14.  68 

16.92 

19.  02 

21. 67 

11 

10 

2.  56 

3.  25 

3.94 

4.87 

15.99 

18.31 

23.  21 

12 

11 

3.  05 

3.82 

4.  57 

5.58 

17.  28 

19.68 

21. 92 

24.  72 

13 

12 

3.  57 

4.40 

5.  23 

6.  30 

18.55 

21. 03 

23.34 

26.  22 

14 

13 

4.  11 

5.  01 

5.89 

7.  04 

19.81 

22.  36 

24.  74 

27.69 

15 

14 

4.66 

5.63 

6.57 

7.  79 

21. 06 

23.  68 

26.  12 

29.  14 

16 

15 

5.  23 

6.  26 

7.  26 

8.  55 

22.  31 

25.  00 

27.49 

30.  58  i 

17 

16 

5.  81 

6.91 

7.  96 

9.  31 

23.  54 

26.30 

28.  85 

32.  00 

18 

17 

6.  41 

7.  56 

8.  67 

10.  09 

24.  77 

27.59 

33.41 

19 

18 

7.  01 

8.  23 

9.39 

10.86 

25.99 

28.  87 

31. 53 

34.  81 

20 

19 

7.  63 

8.  91 

10. 12 

11.65 

27.  20 

30.  14 

3  2.  85 

36.  19 

21 

20 

8.  26 

9.59 

10.85 

12.  44 

28.41 

31.41 

34.  17 

37.  57 

22 

21 

8.  90 

10.  28 

11.59 

13.  24 

29.  62 

32.  67 

35.  48 

38.  93 

23 

22 

9.  54 

10.  98 

12.  34 

14.  04 

30.81 

33.  92 

36.  78 

40.  29 

24 

23 

10.  20 

11.69 

13.  09 

14.  85 

32.  01 

35.  17 

38.  08 

41.64 

25 

24 

10.  86 

12.40 

13.  85 

15.  66 

33.  20 

36.42 

39.  36 

42.98 

26 

25 

11.  52 

13. 12 

14.  61 

16.  47 

34.  38 

37.  65 

40.  65 

44.  31 

27 

26 

12.  20 

13.84 

15.  38 

17.  29 

35.56 

38.89 

41.92 

45.  64 

28 

27 

12.  88 

14.  57 

16.  15 

18.  11 

36.  74 

40. 11 

43. 19 

46.  96 

29 

28 

13.  56 

15.  31 

16.  93 

18.  94 

37.92 

41. 34 

44.  46 

48.  28 

30 

29 

14.  26 

16.  05 

17.  71 

19.  77 

39.  09 

42.  56 

45.  72 

49.  59 

31 

30 

14.  95 

16,  79 

18.  49 

20.  60 

40.  26 

43.  77 

46.  98 

50.  89 

41 

40  ] 

22. 16 

24.  43 

26.  51 

29.  05 

51.80 

55.  76 

59.  34 

63.  69 

51 

50  | 

29.  71 

32.  36 

34.  76 

37.  69 

63. 17 

67.  50 

71. 42 

76.  15 

61 

60 

37.48 

40.  48 

43.  19 

46.  46 

74.40 

79.  08 

88.  38 

71 

70 

45.  44 

48.  76 

51. 74 

55.  33 

85.  53 

90.53 

95.  02 

81 

80 

53.  44 

57. 15 

60.  39 

64.  28 

96.  58 

101. 9 

106.6 

112.3 

91 

90 

61. 75 

65.  65 

69. 13 

73.  29 

107.6 

113. 1 

118.  1 

124.  1 

101 

100 

70.  06 

74.  22 

77.  93 

82.  36 

118.5 

124.3 

129.6 

135.  8 

When  the  degrees  of  freedom,  n' ,  is  greater  than  2,  the  mean  value 
ism'  and  the  variance  is  2n*  while  the  mode  is  at  n*  -  2. 

Next,  to  make  it  obvious  this  new  distribution  belongs  to  small  sample 
theory,  we  note,  when  (xi ,  x2 ,  ••*,  xn)  is  a  landom  sample  from  N(p,  uz  ), 
that 

x  -  (J. 

— -  -  boils  down  to  x  -  p 

/s  (xi  -  x)2  s/^n 

V  cr2(n  -  1) 

our  "  Student-t"  variable.  Note  the  denominator  in  the  first  expression  is 
the  square  root  of  our  x2  divided  by  (n  -  1)  which  is  equivalent  to  the  square 
root  of  (s  /or  ).  Right  here  you  might  expect  that  x  becomes  normal  as 
n  gets  bigger.  Also  the  choice  of  the  concept  of  degrees  of  freedom  becomes 
more  meaningful  when  we  see  v>e  are  really  referring  to  the  number  of 
random  variables  independently  chosen  from  the  normal  distribution.  In 
symbols  we  have  found 


t  =  x 

V)F7n 

if  x  comes  from  N{0, 1). 

Student*  s  t,  therefore,  affords  the  solution  to  a  variety  of  problems 
beyond  that  for  which  it  was  originally  intended  because  it  is  applicable  to 
all  cases  which  can  be  reduced  to  a  comparison  of  the  deviation  of  a  normal 
variate  with  an  independently  distributed  estimate  of  its  standard  deviation, 
derived  from  the  sums  of  squares  of  homogeneous  normal  deviations 
either  from  the  true  mean  of  tne  distribution  or  from  the  means  of  samples. 
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“'•Xr^lMhVr-, 


We  can  distinguish,  by  virtue  of  degrees  of  freedom,  by  saying  that 
for  a  random  sample  (xi ,  x2,  •  •  • ,  xn)  from  N(|i,  c r2  )  both  of  the  following 
random  variables  are  distributed  like  chi-square  and  further 


n 

2 


£  (2i~E)  has  n  degrees  of  freedom 


while 


has  (n  -  1)  degrees  of  freedom. 


There  are  many  applications  for  which  we  need  a  transformed  version 
of  x2.  Suppose  our  random  variables  are  xi ,  x2,  *•*,  xn,  each  being 
N(0,  1).  Then 

Variable  =  x  Frequency  Function  with  n  d.  f. 


£  2 

2  Xi 

1 


I  2  xi 
n  1 


^ch;n(x) 


nfch;n^^ 


n 

2x1 

1 


V 


n  ? 
1  2 

I  2  Xi 

n  1 


2xfch;n(x2 > 


^chjn^2  >• 


Z  •  ,2 

A  fine  algebraic  property  of  x  is  ^  Xi  is  of  iii  degrees  of 
freedom,  and  similarly  for  x2  »  ik211  Xi  +  X2  is  X2  nx  +  n2  degrees 

of  freedom.  This  reproductive  property  is  shared  by  the  binomial,  Poisson, 
and  normal  distributions. 
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2.  Illustration.  Suppose  you  are  interested  in  only  90%  confidence 


in  estimating  the  population  variance  cr2  from  a  sample  of  size  10  with 
s2  =  195.  From  Table  DC  we  find 

X205  =  3.  33,  X%5  =  16.92. 

Then  the  interval  estimate  is  given  by 

Pr {3.  33  <  9jl95)  <  16.92}  =  .90. 
cr 

Thus  the  interval  estimate  for  <r2  is 

9(195)  <  <  9(195) 

16.92  3.33 

or 

103  <  cr2  <  527. 

D.  Fisher*  s  Z  and  Snedecor*  s  F  Distributions. 

In  1924  Fisher  concerned  himself  with  the  distribution  of  quotients 
of  sums  of  squares  of  normally  distributed  random  variables.  He  called 

Xi  /ni 
X2  /n2 
2Z 

e  and  found  the  distribution  of  Z.  As  mysteriously  complicated  as  this 

appears  at  first,  so  is  the  reason  for  it  that  simple.  He  wanted  to  devise 

a  testing  function  for  the  difference  between  two  variances,  s2  and  s2% 

% 

derived  from  two  samples  from  normal  distributions.  If  we  went  about 
this  a 3  we  have  with  ether  statistics,  we  would  have  considered  how  often 
si  -  s2  would  exceed  its  observed  value.  Of  course  our  testing  statistic 

U 
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would  have  to  have  and  <r2  in  it.  The  only  way  to  get  rid  of  them 
would  be  to  replace  them  by  Si  and  s2,  respectively.  But  remember 
how  we  had  to  get  a  new  distribution  when  we  similarly  changed  the  test 
statistic 

'  X  -  it  to  X  -  P 
<t  l^fn  s/Vn 

for  small  samples.  Here  we  would  be  trying  to  revamp  something  like 
(si  -  s2)  -(<ri  -  <r2)  int0  (st  -  s 2  )  -  (oh  -  cr2 ) 


Fisher  said  the  only  exact  treatment  can  come  from  eliminating  the  unknown 

o-j  and  or 2  from  the  distribution  by  replacing  the  distribution  of  Si  by 

that  of  In  i  =  1,  2.  In  this  way  you  will  note  our  interest  goes  from 

si  -  s2  to  in  Sj  -  in  s2  to  in  —  . 

s2 

Moreover,  whereas  the  sampling  errors  in  Si  are  proportional  to  c*, 
the  sampling  errors  of  in  sj  depend  only  on  the  size  of  the  sample  from 
which  sj  was  calculated. 

In  ]  934  Snedecor  transformed  the  variable  to  e ^  and  out  of  honor 
to  Fisher  wrote  F  for  e^‘.  He  gave  the  probability  element  to  be 

"L-l 
2 

l  ■- 

where  we  have  nj  and  n2  degrees  of  freedom. 


ni+n2  dM-]F  ,  F  >  0, 
2 


rjni  +  n2 
1  ? 


||Hlf 

H  n2 
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TWa  is  a  highly  tabulated  function  for  which  Tables  X  and  XI  give 
values  of  F  for  90%  and  95%  confidence,  respectively.  The  following 
characteristics  apply: 


Mean  =  pp  =  n2  ,  n2  >  2 
n2  -  2 


Variance  =  crp 


2n2  (nt  +  n2  -  2) 

- - ,  n2  >  4 

n:  (n2  -  2)  (n2  -  4) 


Mode  “  ~ — r-2 . -  ,  n,  >  2. 

nx  n2  +  2 

Using  our  definition  of  x2  for  samples,  we  see  that  if  n2  +  1  and  n2  +  1 
are  the  sample  sizes  (then  nA  and  n2  are  the  degrees  of  freedom). 


2 


Si 

x?  /ni 

-  ni  —  Sl  t*7* 

x2  /n2 

n2s22  s  22/(r22 

2 

n2  °2 

Just  as  in  the  nonsymmetric  case  of  x2*  we  here  will  designate,  in 
contrast  to  the  symmetric  cases  of  the  normal  z  and  Student  t,  our  lower 
and  upper  confidence  limits  multipliers  by  Flc  anc^  ^UC»  respectively. 

What  is  tabulated  is  the  ratio  of  the  sample  variances  of  different 
sizes  from  a  standard  unit  normal  distribution.  Like  the  t- distribution, 
it  is  independent  of  population  variance  if  both  samples  are  drawn  from 
the  same  population,  i.  e. , 

2  ,  2  _  2 

F  =  Sl  UL  :=  1L.  . 

2  .  7  Z 

sz  /o-2  S2 
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Our  general  probability  statement,  therefore,  for  any  two  different  normal 
populations  and  for  any  two  different  size  samples  with  variances  s* 
and  Sz  is 


Pr 


<  FUC 


c 


or 


Pr 


1  S!2 


4r  1 

<  EL.  <  1 


sf 


FUC  sz  .  *1  FLC  sz 


-  c 


This  gives  a  probabilistic  hold  on  relative  precisions,  if  you  will, 
from  two  samples  and  thereby  on  the  variance  ratio  / cr2  , 

The  tables  are  set  up  for  ratio  values  greater  than  unity,  that  is, 
for  a  larger  variance  in  the  numerator,  i.  e.,  for  F  >  1.  Consequently 
the  lower  confidence  limit,  F^q,  for  a  fixed  confidence  cannot  be  directly 
read  from  the  table.  However  it  can  be  found  by  using  the  table. 

Think  of  the  ratio  F  =  s \  /si  as  in  Figure  6. 


Figure  6 


/ 
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Since  the  identification  of  which  sample’ s  variance  is  in  the  numerator  is 


arbitrary,  we  see  that 

Pr{F  <  1}  =  Pr{F  >  1}  =  1/2. 

Suppose  we  wish  to  find  for  confidence  c  the  Fyc  and  Fjx  when  the 
numerator  sample  has  nj  degrees  of  freedom  and  the  denominator  sample 
has  n2  degrees  of  freedom. 

Since  Pr{F  >  F' }  =  Fr{F  <  1/F’}, 

but  also  Pr{F  <  1/F'}  =  Pr{l/F  >  F>  }, 

we  have  F LC(ni,n2 )  =  1  /FUC(n2 ,  m  )• 

where  F'  is  a  specific  value  of  F. 

1.  Illustration.  Suppose  we  have  a  sample  of  size  8  with  variance 
7.  14  and  a  sample  of  size  10  with  variance  3.  21.  Find  a  90%  confidence 
interval  estimate  for  the  quotient  of  the  populations’  variances. 

a.  Ar gument.  The  larger  variance  goes  into  the  numerator 
of  F,  so  n.i  =  7  and  n2  =  9.  Then 

F  =  7.14/3.21  =  2.22. 


From  the  95%  cumulative  Table  XI  we  see  that 

FUC(7,  9)  =  3,29  while  FLC(7,  9)  =  1  /fUC(9,  7) 

=  1/3.68 


,1  2.22 

Pr<r-~  <  ~T—T  <  3.29>  =  .9 
,3,  68  o-i  /<r2  I 


or  (2.  22)  <  ZL.  <  3.68(2.  22) \  =  .9 

\3.  29  or2 
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So  a  90%  confidence  interval  estimate  of  the  quotient  of  the  population 
variances  is  (.  67,  8.  18). 


m 


2.  Illustration.  Two  different  samples  of  size  25  each  have  variances 
1.  04  and  .  51.  What  can  be  said  about  the  population  variances? 

a.  Ar gument.  By  an  argument  similar  to  that  in  the  previous 
illustration,  we  find  for  nj  =  n2  =  24  and  for  c  =  .  90  that  Fjjq  =  1.98  = 


1/FLC 


or 


L.S>±L:.5l  <  1. 9s\  =.9 

V* 98  oh  1*2  J 

pj  1-04  <  **/**  <  (1.04)(1.98)  1  i  9 

\.  51(1.98)  .51  / 


So  (1. 03,  4.  04)  is  a  90%  confidence  interval  estimate  of  the  quotient  0^  / cr2  . 

It  should  be  noted  that  this  F- distribution  includes  the  normal  distri- 
bution,  the  x  -  distribution  and  Student*  s  t-distribution  as  special  cases 
per  the  following; 

(1)  n2  =  co  —  F  =  x2  /nj . 

(2)  m  =  00  —  1  /F  =  x2  /n2  . 

(3)  nx  =  1  —  nTF  =  t. 

(4)  nj  =  1} 

—  F  =  z* 


n2  =  00 

3.  There  is  a  very  important  point  to  make  regarding  the  need  for 
an  F- distribution  analysis  before  making  a  t-distribution  analysis  on  the 
difference  between  two  sample  means  such  as  was  done  in  the  illustration 
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in  paragraph  B3,  page  38.  Therein  it  was  assumed  the  variances  of  the 
base  populations  were  equal  and/or  both  populations  were  the  same.  We 
pooled  the  two  sample  variances  to  get  a  good  estimate  of  this  variance 
which  the  application  of  Student’ s  t  needed  and  assumed  was  the  same  for 
the  two  base  populations. 

Therefore  it  falls  upon  us,  prior  to  a  Student’ s  t  analysis  on  the 
difference  of  the  sample  means,  to  determine  whether  the  sample  variances 
are  enough  alike  to  support  the  assumption  that  they  are  independent  esti¬ 
mates  of  a  common  population  variance.  So  the  F- distribution  should 
enter  the  scene  first. 

4.  Finally,  and  in  contrast,  for  large  samples  we  should  remember 
that  the  two  sample  standard  deviations  are  analyzed  by  considering  the 
variance  of  the  actual  random  variable  difference  of'I^Se  two  standard 
deviations  and  that  we  do  this  by  using  the  variance  of  the  distribution  of 
sample  standard  deviations  which  is  v6  /2n. 

a.  Illustration.  Two  independent  samples  of  sizes  744  and 
22  have  standard  deviations  1. 6  and  2. 1,  respectively.  Compare  the 
samples  with  regard  to  the  possibility  of  their  coming  from  a  common 
population.  For  example,  such  a  problem  might  arise  in  dealing  with 
superceding  items.  The  original  FSN  supported  a  known  population  with 
known  demands  over  time.  We  have  collected  demands  on  the  superseding 
FSN  for  a  period  of  time,  and  now  we  would  like  to  know  if  it  is  supporting 
the  same  population  as  the  superseded  item. 


{1)  Argument.  Assuming  it  is,  we  estimate  the  variance 


by  pooling  to  be 


s 


2 


743(1. 6)2  +  ?1  (2. 1)2 
744  +  22  -2 


2.61 


Therefore 


„.2  =  n-2  +  <T2  4  2. 61  4.  2. 61 

Si  -s2  Si  s2  -  2(744)  2(22) 


=  .0611 

cr  =  .  248  say  .  25  . 

si  -s2 


So  the  calculated  standard  deviation  of  the  random  difference  between 


standard  deviations  of  two  such  sized  samples  is  0.  25.  Thus 


Pr 


c  (si  -  S2  )  -  HSl  _s2  _  Sx  -  S2  ^ 

-  -  0>  25  - 


“  c. 


“Si  -s2 

But  (si  -  s2)/0.  25  is  N(0,  1),  and  so  our  particular  difference,  2.  1  -  1.  6  =  .  5 
and  thus  two  standard  deviations  would  cover  the  random  difference  between 


the  sample  deviations  so  we  would  assume  a  common  population. 
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m.  TOLERANCE 


A.  Floating  Interval. 

Heretofore  we  have  studied  the  theory  and  method  of  estimating 
magnitudes  of  population  characteristics  by  intervals,  namely  confidence 
intervals.  Now  we  wish  to  speak  briefly  of  another  type  of  interval  esti¬ 
mate  which  is  used  when  you  want,  so  to  speak,  to  cover  a  range  of  values 
and  not  just  a  single  value.  In  particular,  it  is  frequently  desirable  to 
make  an  estimate  which,  with  certain  confidence  as  we  have  used  the  con¬ 
cept,  contains  nearly  all  of  the  population  values.  There  are  times  when 
you  and  I  would  like  to  know  within  what  limits  a  certain  percentage,  say 
9,9%,  of  the  base  population  lies. 

Obviously  if  we  knew  the  mean  p  and  standard  deviation  ar  of  the 
base  population  and  also  if  it  was  normally  distributed,  then  (p  -  3<r,  p  + 

3cr)  would  be  a  satisfactory  interval.  In  lieu  of  such  base  population  know¬ 
ledge,  we  can  use  the  sample  mean  x,  the  sample  standard  deviation  s, 
and  then  we  can  pick  a  kc  such  that  x  -  kcs  and  x  +  kcs  would  include 
99.  7%  of  the  base  population  with  level  of  confidence  c.  The  choice  of  kc, 
depends  as  much  on  our  further  assumptions  about  the  type  of  base  popu¬ 
lation  distribution  as  on  the  size  of  the  sample. 

The  end  points  of  such  "  floating"  intervals  are  called  statistical 
tolerance  limits  and  the  interval  itself  a  statistical  tolerance  interval. 
Obviously,  as  the  sample  size  increases,  these  intervals  tend  to  a  fixed 
size  which  depends  on  the  percentage  of  base  population  you  wish  to  pick  up 


In  contrast,  confidence  intervals  decrease  in  width  to  zero  as  the  sample 

size  increases.  Though  both  types  tend  to  vary  less  in  both  position  and 

% 

width  among  each  other  for  a  fixed  sample  size,  the  confidence  interval 
pinches  in  on  the  true  value  of  the  population  parameter  while  the  statistical 
tolerance  interval  tends  to  a  fixed  size  since  it  gives  limits  within  which 
an  expected  proportion  of  the  population  lies  with  some  confidence. 

So  we  see  a  tolerance  multiplier  k  which  depends  on  n,  P,  and  c 
is  such  that  we  can  be  100c%  confident  that  a  proportion  P  of  the  population 
lies  between  x  -  ks  and  x  +  ks.  Now  there  are  tables  that  provide  us  with 
values  of  k  if  a  normal  distribution  can  be  assumed  and  there  are  tables 
for  when  it  can’t  be  assumed.  Historically  the  latter  came  first  from  the 
work  of  S.  S.  Wilks  and  we  will  discuss  them  later  in  the  course.  For  the 
present  we  will  restrict  ourselves  to  the  assumption  of  normality  and  use 
values  of  k  from  known  tables  with  the  experimental  data  in  an  earlier 
illustration. 

B.  An  Illustration. 

From  the  data  in  the  Project  -  Simulation,  pages  8  to  1  2, 
tolerance  intervals  for  each  of  the  three  different  confidences  .  90,  .  95 
and  .  99  can  be  calculated  and  then  the  percentage  of  base  population  they 
pick  up  can  be  given.  For  n  =  16,  we  must  modify  the  t- distribution 
coefficients  1.753,  2.131,  2.947  to  the  values  in  Table  XII.  The  values 
in  Table  XII  were  taken  from  much  more  extensive  tables  whose  repro¬ 
duction  here  is  not  warranted  for  our  immediate  purpose.  Further  we 


Table  XU 


\p 

C  Nv 

.90 

.95 

.99 

.90 

2.  246 

2.  676 

3.514 

.95 

2. 437 

2.  903 

3.812 

.99 

2.  872 

3.421 

4.  492 

will  restrict  ourselves  in  the  examination  of  the  data  to  90%  coverage,  i.  e. , 
P  =  .  90,  and  to  c  =  .  90.  The  calculations  and  results  are  given  in  Table 
XIII,  using  k  =  2.  246. 


Table  XIII 


Sample 

Number 

X 

1 

s 

(x-  2.  246s,  x+  2.  246s) 

Empirical 

Proportion 

Satisfactory- 

Coverage 

1 

4.  375 

1.  204 

(1.671,  7.  079)  —  (2,  7) 

91% 

Yes 

2 

3.938 

2.  00 

(0,  8.430)-  (0,  8) 

98% 

Yes 

3 

4.750 

1.  24 

(1.965,  7.535)-  (2,  7) 

91% 

Yes 

4 

5.125 

1.65 

(1.419,  8.831)-  (2,  8) 

96% 

Yes 

5 

5.000 

1.86 

(.822,  9.178)  -  (1,  9) 

99% 

Yes 

6 

4.  250 

2.17 

(0,  9.123)-  (0,  9) 

99. 5% 

Yes 

7 

5.125 

1.41 

(1.  958,  8.  292)-  (2,  8) 

96% 

Yes 

8 

5.437 

2.16 

(.113,  10.  761)— (1,  10) 

99.  5% 

Yes 

9 

4.  625 

2.36 

(.676,  9.926)-  (1,  9) 

99% 

Yes 

10 

4.  938 

1.39 

(1.817,  8.  059)-  (2,  8) 

96% 

Yes 

On  the  average  we  should  get  90%  of  the  toler*  ace  intervals  covering 
90%  of  the  population.  We  did  better  since  all  of  them  covered  at  least  90%, 
Hence  we  acquire  some  reassurance  in  this  illustration  for  saying  we  are 
90%  confident  in  any  one  sample  estimate. 

There  are  tables  available  that  give  a  different  value  of  k  such  that 
we  can  be  100c%  confident  that  a  proportion  P  of  the  base  population  will 
lie  above  (below)  x  -  ks(x  +  ks).  And,  as  we  stated  earlier,  we  have  tables 
for  all  such  cases  when  the  assumption  of  normality  is  dropped. 


IV.  SIGNIFICANCE 


A.  Significance  Testing. 

Suppose  a  person  has  obtained  a  set  of  observations  of  some  process 
and  has  computed  an  average.  He  wants  to  know  whether  his  statistic 
represents  all  observations  for  the  process.  If  his  experience  is  such 
that  intuitively  he  believes  the  sample  average  is  close  to  the  average  of 
the  base  population,  he  will  make  the  statement  that  they  are  about  equal. 
This  is  his  hypothesis.  To  satisfy  himself  that  his  hypothesis  is  reasonable 
he  calculates  the  probability  that  his  sample  could  have  occurred.  The 
significance  test  does  not  tell  him  whether  his  hypothesis  is  right  or  wrong. 
But  by  choosing  the  proper  level  of  significance  he  knows,  according  to 
probability  theory,  that  he  will  seldom  reject  a  true  hypothesis,  and  thus 
he  can  proceed  as  though  his  assumption  was  fact. 

A  significance  test  involves  a  random  sample  and  a  probabilistic 
computation  which  decides  whether  or  not  the  sample  could  have  reasonably 
come  from  an  assumed  distribution.  Acceptance  of  the  assumed  distribution 
for  parenthood  comes  when  the  observed  sample  result  is  no  less  probable 
than  some  predetermined  small  probability  like  .10,  .  05,  or  .01.  This 
degree  of  rareness  due  to  chance  is  called  the  significance  level.  If  the 
result  from  the  sample  is  less  probable  due  to  chance  than  this,  we  say 
tt.  •  '  jsult  is  statistically  significant  and  we  mean  significant  of  other  than 
chance.  The  region  of  values  where  probability  of  occurrence  is  greater 
than  this  is  called  the  acceptance  region  while  the  complementary  region 


is  called  the  critical  or  rejection  region.  The  latter  words  are  descriptive 
of  the  decision  to  reject  the  parenthood  of  the  assumed  base  population. 

This  is  commonly  called  rejecting  the  Null  Hypothesis  which  assumed  no 
difference  between  assumed  base  population  and  the  sample  other  than 
what  chance  allowed. 

Actually  this  amounts  to  saying  whether  the  computed  confidence 
interval  does  or  does  not  include  the  corresponding  parameter  of  the  base 
population.  At  the  outset  it  appears  that  a  confidence  interval  approach  to 
making  such  a  decision  has  the  advantage  of  giving  some  idea  of  how  large 
the  difference  between  statistic  and  parameter  is  likely  to  be  while  a  test 
of  significance  gives  a  cut  and  dried  yes  or  no. 

When  we  reject  the  Null  Hypothesis,  i.  e.  ,  decide  the  discrepancy 
between  the  statistic  and  parameter  is  too  rare  to  be  due  to  chance,  we 
are  said  to  be  invoking  the  principle  of  advocatus  dlaboli  or  The  Devil*  s 
Advocate.  This  derives  from  the  characterization  of  that  diabolical  fellow 
to  make  adverse  criticism  of  what  was  deemed  good. 

Another  word  ought  to  be  said  here  about  the  dependency  of  acceptance 
or  rejection  on  the  particular  characteristic  and  its  distribution  function. 

We  accept  or  reject  the  sample -parent  association  through  such  a  device. 
Hence,  as  we  will  show  later,  it  is  possible,  given  a  fixed  sample  and  an 
assumed  base  population,  to  have  two  tests  based  on  different  statistics, 
one  accepting  and  one  rejecting. 

Significance  tests  use,  in  general,  critical  regions  in  one  of  the 
three  ways  illustrated  in  Figure  7  for  the  particular  value  of  5%  where 


we  have  two  one-tailed  tests  and  a  two-tailed  test. 


.025 

d 


Figure  7 


Some  examples  may  now  be  in  order. 

1.  Illustration.  A  certain  stock  number  has  the  probability  p  =  .  8 
that  each  requisition  will  be  for  a  quantity  of  one  unit  of  stock.  During 
the  past  month  10,  000  requisitions  have  been  received,  8,  500  were  for  a 
single  unit  of  stock.  Is  the  units-per-requisition  pattern  changing? 

a.  Argument.  Our  Null  Hypothesis  is  that  the  sample  behavior 
of  the  last  month  is  consistent  with  the  long-known  p  =  .  8.  Now  if  we  use 
the  normal  approximation  to  a  binomial  distribution  assumed  true  here  with 
n  =  10,  000  and  p  =  .  8,  we  have  p  =  8,  000  and  <r  =  1, 600  =  40.  Suppose 

we  take  a  significance  level  of  .  01.  Then  we  would  reject  the  Null  Hypothesis; 
i.  e.,  we  would  reject  this  sample  coming  by  chance  from  the  assumed  base 
population  if  x  >  8,  000  +  (2.  33){40)  =  8,  093,  where  x  =  number  of  requisi¬ 
tions  with  a  single  unit  per  order.  For  this  can  happen  in  only  1%  of  many 
repeated  cases  due  to  chance.  Since  8,  500  >  8,  093,  we  reject  chance  and 
claim  a  significant  difference  due  to  other  than  chance.  Hence  we  disassociate 


sample  and  assumed  parent  population.  The  size -order  pattern  has  changed. 
We  are  more  than  99%  confident  though  we  do  not  usually  so  speak. 

Confidence  was  used  in  reflecting  how  frequent  due  to  chance  some¬ 
thing  we  found  in  one  case  would  happen  in  repeated  cases.  When  we  reject, 
as  we  just  did,  something  that  was  rare  due  to  chance  before  it  was  rejected, 
it  is  not  quite  the  same  thing  as  making  a  positive  confidence  interval 
statement.  Still  there  is  a  relationship. 

b.  Argument  (Alternate).  Now  we  could  have  computed  a  99% 
confidence  interval  here.  Since  the  hypothesis  is  p  =  .  8  and,  let  us  say, 
the  alternative  is  p  >  .  8,  we  use  a  one-sided  confidence  interval  and  say 


Pr  <  _i_85  -  P  _  <  2.  33  > 

j  n/(.  85j(.  15) 

l  V  10,000 


.99. 


This  can  be  reformed  into 

Pr  {.  835  <  p}  =  .99. 

Since  p  =  .  8  <  .  835,  we  reject  p  =  .  8  as  it  is  not  in  the  confidence  interval 


.(.835,  1). 

2.  Illustration.  Suppose  Washington  decorates  our  unit  when  we 
are  very  effective  and  they  have  done  this  for  each  of  the  past  five  months. 
Would  you  say  from  the  statistical  point  of  view  that  our  probability  of 
being  decorated  exceeds  .5? 

a.  Argument.  From  the  significance  testing  point  of  view  if 


we  accept  p  =  .  5,  then  the  sample  situation  has  a  probability  of  occurring 


equal  to  (.  5)5  =  .  031  and  we  would  reject  the  hypothesis  of  p  =  .  5  at  the 
5%  significance  level,  but  not  at  the  1%  level.  Remember  that  if  one  uses 
a  higher  value  for  the  significance  level,  he  runs  a  great  risk  of 
accepting  a  false  hypothesis. 

b.  Argument  (Alternate).  From  another  point  of  view  we  let  x 
be  the  number  of  times  in  5  months  we  are  decorated  and  p  be  the  probability 
of  being  decorated  in  any  month.  Then  we  ask  that 

Pr{x  <  5}  <  .  95 

•>o  that 

Pr{x  =  5}  >  .  05. 

This  requires 

p5  >  .05 
or 

p  >  .  55. 

So  p  must  be  as  large  as  .  So  to  keep  the  sample  action  from  being  less 
probable  than  .  05.  Hence  we  do  not  pick  up  p  =  .  5  in  our  confidence  interval 
(.  55,  1)  and  so  we  reject  p  =  .  5  for  hypothesis.  But  certainly  we  must 
agree  with  the  fact  that  p  exceeds  .  5. 


B.  Relation  between  Confidence  Intervals  and  Tests  of  Significance. 

The  practicioneer  usually  prefers  a  confidence  interval  statement 
to  that  using  only  a  test  of  significance  because  the  width  of  the  confidence 


interval  tells  more  about  the  reliance  he  can  place  on  the  results  of  the 


experiment.  Still,  when  a  test  of  significance  is  accompanied  by  the 


appropriate  Operating  Characteristic  Curve  (OCC),  about  the  same  infor¬ 


mation  is  provided.  In  order  to  understand  this  let  us  first  consider  our 
situation  in  deciding  whether  to  accept  or  to  reject  a  hypothesis. 

If  we  reject  a  hypothesis  when  we  should  not  have,  we  say  that  a 
Type  I  error  has  been  made.  If  we  accept  a  hypothesis  when  we  should  not 
have,  we  say  that  a  Type  II  error  has  been  made.  In  general  in  life  these 
two  situations  constitute  the  alternatives  in  making  wrong  decisions  or 
errors  in  judgment.  Ideally  we  want  tests  to  minimize  such  errors.  Unfor¬ 
tunately,  for  a  fixed  sample  size,  when  we  decrease  one  type  of  error 
v/e  increase  the  other.  Only  increasing  the  sample  size  reduces  both. 

In  industry  in  acceptance  sampling  the  probability  of  the  Type  I  error 
is  called  the  Producer*  s  Risk  and  denoted  by  a  while  the  probability  of  a 
Type  II  error  is  called  the  Consumer1  s  Risk  and  denoted  by  p.  Obviously 
the  Type  I  error  is  the  basis  of  our  familiar  level  of  significance  test.  It 
represents  the  chance  we  are  willing  to  take  to  be  wrong  in  rejecting  chance. 
Now  we  could  eliminate  Type  II  errors  by  never  accepting  hypotheses! 

But  this  would  get  us  nowhere.  Better  should  we  study  the  probabilities 
of  making  Type  II  errors  and  hope  for  little  chance  of  making  them.  The 
quantity  (1  -  p)  is  helpful  here  in  that  it  indicates  the  ability  or  power  of 
the  test  to  reject  the  hypothesis  if  it  is  false.  Hence  it  is  called  the  Power 
Function. 

A  confidence  interval  can  be  used  for  a  test  of  significance --this  we 
have  illustrated.  Using  a  rejection  criterion  alone  in  the  converse 


situation  is  not  the  proper  way  to  think  of  a  significance  test.  You  should 
always  think  of  the  associated  OC  Curve  as  part  of  the  test. 

First  let  us  look  at  several  situations  in  which  we  highlight  (3. 

1.  Illustration.  If  p  is  the  probability  of  a  particular  FSN  being 
demanded  in  a  day,  suppose  we  order  this  item  if,  out  of  every  10  items 
demanded,  one  or  more  are  for  this  FSN.  The  probability  that  the  experience 
with  10  items  does  not  have  us  order  is  a  function  of  p.  It  is  the  operating 
characteristic  function  of  the  examination  procedure  and  is 

P  =  (1  -  P)1  °  • 

In  Figure  8  we  see  a  graph  of  (3. 


Figure  9  gives  a  graph  of  1  -  (3,  the  power  function. 


Figure  9 


Now  {3  represents  the  probability  of  not  ordering  the  FSN  when  it 
has  the  probability  p  of  being  demanded  and  hence  should  be  replaced 
according  to  this  probability.  Hence  {1  -  p)  represents  the  probability  of 
ordering.  The  graphs  in  Figures  8  and  9  show  that  the  procedure  is  pretty 
much  iu  xine  with  the  actuality  it  is  intended  to  follow. 

2.  Illustration.  The  Bureau  of  Budget  (BUBUD)  claims  that  a  certain 
FSN  is  not  ordered  by  half  the  Stock  Points  while  the  unit  here  feels  it  is. 

To  test  the  situation  you  examine  five  Stock  Points  and  decide  to  accept 
BUBUD*  s  claim  only  if  either  all  the  five  Stock  Points  ordered  or  all  did 
not  order  the  item.  Otherwise  you  will  assume  it  is  ordered  by  half  the 
Stock  Points. 

Now  the  probability  of  accepting  BUBUD*  s  claim  is  a  function  of  p, 
the  probability  of  a  Stock  Point  ordering  this  item.  This  function  of  p  is 
the  Power  Function  for  your  test,  that  is,  it  is  (1  -  ft)  where  p  is  the  proba¬ 
bility  of  accepting  your  claim.  Look  at  Figure  10. 


Figure  10 


So  if  p  =  1/4  or  3/4,  your  probabi.’.ity  of  not  making  a  mistake  is 
only  about  1  / 4,  that  is,  your  probability  of  not  accepting  p  =  1  /  2  when 
p  4  1  /2  is  not  very  high.  When  p  =  1  /2,  then  (3  is  the  probability  of  accept¬ 
ing  p  =  1  /2  when  it  should  be  accepted  and  so  1  -  (3  is  the  probability  of 
not  accepting  it  when  it  is  true.  But  this  is  the  Type  I  error,  or  or,  which 
is  in  this  case  .  06. 

[a  =  1  -  p  =  p5  +  (1  -  p)5  =  |-|)  +  (l  -|j5=  .031  +  .031  =  .06] 

This  test  procedure  is  not  very  powerful  in  that  it  does  not  strongly 
have  you  reject  the  hypothesis  p  4  1  /2  when  p  4  1/2. 

Now  let  us  take  a  numerical  example  and  tie  in  both  approaches  of 
confidence  intervals  and  of  significance  testing  with  the  operating  charac¬ 
teristic  curve. 

3.  Illustration.  An  FSN  has  a  mean  requisition  size  of  300  and  a 
standard  deviation  of  24.  Suppose  we  want  to  know  at  the  1%  level  of 
significance  from  a  sample  of  64  requisitions  if  this  mean  requisition  size 
h§is  increased. 

a.  Ar  gument.  In  customary  notation  we  say 

Hq  :  Null  Hypothesis  that  D  =  300  has  not  changed 
:  alternate  hypothesis  that  D  >  300  and  D  has 
changed 

.  The  1%  level  of  significance  corresponds  in  normal  theory  to  z  =  2.  33  and 

so  to 
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Graphically  this  is  given  in  Figure  11. 


Figure  11 


So  our  Type  I  error  has  probability  a  =  .  01.  For  each  possible 
actual  new  value  of  D,  there  is  a  chance  of  accepting  the  old  D  =  300.  To 
show  this,  let  us  first  take  the  value  D  =  310  as  being  the  actual  new  average 
demand.  Then  the  means  of  samples  of  size  64  are  normally  distributed 
about  310  and  with  some  chance  the  sample  means  will  be  to  the  left  of  the 
critical  point  point  for  rejecting  the  old  hypothesis,  D  =  300,  as  shown  in 
Figure  1  2. 


p  =  .  16 


a  =  .  01 


Figure  1  2 


Now  under  the  new  hypothesis  307  corresponds  to  z  =  -1  and  so 


More  generally  we  can  calculate  (3  for  various  new  D  as  given  in 


Table  XIV. 


Table  XIV 


D 

290 

295 

300 

305 

310 

315 

320 

P 

1. 00 

1. 00 

.  99 

.  75 

.16 

.  00 

.  00 

The  OC  Curve  and  Power  Function  are  graphed  in  Figure  13. 


Figure  13 

Next  let  us  perform  an  analysis  still  using  99%  significance  similar 
to  what  we  did  in  Illustrations  1  and  2,  pages  65  and  66,  and  indicating 

the  associated  confidence  level  approach.  We  want  and  have 

/d-^d  \ 

Pr  (  - <2.33}  =  .99 

^  24/N/64  j 

which  can  be  reformed  into 


Pr{D  -  7  <  p^}  =  .99 
or  (D  -  7,  oo)  is  the  confidence  interval. 


Figure  14 

Note  the  range  of  99%  of  the  activity  under  the  curve  in  Figure  14, 
starting  7  units  to  the  left  of  pj^  and  thence  to  the  right,  is  about  16  units 
which  is  about  the  span  of  indeterminancy  in  the  OC  Curve  from  300  to  315. 

We  see  from  Figure  14  that  there  is  a  small  chance  of  keeping  the 
hypothesis  D  =  300  when  D  >  315  while  we  are  almost  certain  of  keeping 
it  when  D  <  300.  Were  we  to  have  used  the  confidence  interval  approach 
at  c  =  .  99,  i.  e. ,  a  =  .  01,  our  random  interval  would  be  one-sided  simply 
because  the  alternate  hypothesis  is  D  >  300.  Thus  we  would  say 

(D  -  7,  oo) 

is  the  confidence  interval  estimator  and,  as  you  know,  this  interval  in 
repeated  samples  on  the  average  would  include  pj=j  =  pp  99  out  of  100 
times.  Alsc.  we  recognize  that  for  this  same  percentage  of  times  the  value 
of  D  would  land  in  a  span  of  about  16  units. 
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The  two  approaches  can  be  illustrated  wirh  respect  to  determining 
the  sample  size  in  order  to  detect  differences  between  means.  We  can 
specify  limits  to  the  risks  for  Type  I  and  Type  II  errors.  This  locates 
two  points  on  the  OC  Curve.  Selection  of  n  follows  from  examination  of 
various  OC  Curves  for  different  n  and  matching  these  two  points. 

On  the  other  hand  we  can  specify  the  magnitude  of  difference  between 
means  which  is  our  limit.  Then  we  can  compute  the  sample  size  which 
gives  with  desired  confidence  an  interval  of  this  length.  Let  us  illustrate  this. 

4.  Illustration.  In  the  problem  of  Illustration  2,  page  66,  suppose 
we  t2st  the  hypothesis  of  the  FSN  being  demanded  half  the  time,  i.  e. , 
p  =  .  5,  by  examining  a  sample  of  future  demands.  Now  let  us  decide 

(1)  the  probability  of  rejecting  p  =  .  5,  when  it  is  correct, 
is  not  to  exceed  .  05.  This  amounts  to  saying  a  =  .05. 

(2)  the  probability  of  accepting  p  =  .  5  when  p  £  .  6  or 

p  £  .  4  is  not  to  exceed  ,  05.  This  amounts  to  saying  (3  =  .  05. 

Find  the  minimum  sample  size  and  state  the  statistical  decision  rule. 

a.  Argument.  Graphically  we  have  Figure  15. 


Let 

N  =  number  of  observations  in  sample 
x  =  number  of  times  particular  FSN  is  ordered 

Then 

area  under  u.  =  .  5  curve  to  right  of  — ~L=i=^L=r-  =  .025 

n/N(.5)(.5) 

area  -under  (jl  =  .  6  curve  to  left  of  - x  "  — =  .025 

n/N(.  6)(.  4) 

=  1.96  and  x  --•6N  =  -1.96. 

.  5nTn  .  49 <s/" N 

Hence  we  have  the  two  simultaneous  equations  in  x  and  N, 

x  =  .  5N+  .  980  n/N 
x  =  .  6N  -  .  960  nTn, 

which  yield 

N  =  377 
x  =  208. 

When  p  =  .  5,  then  x  -  Np  =19.  Thus  we  would  gel;  a  similar  span  of 
values  to  the  left  of  189.  So  we  decide: 

(1)  accept  the  hypothesis  p  =  .  5  if  in  377  demands,  we 
have  demands  for  this  FSN  in  the  range  189  ±  19,  i.  e.,  between 
170  and  208. 

(2)  reject  the  hypothesis  otherwise. 

5.  Illustration.  In  the  last  illustration  retain  everything  but  now 


require:  the  probability  of  accepting  p  =  .  5  when  actually  p  ^  .  6  is  .  05. 


In  this  case  we  find  N  =  319,  x  =  177.  So  we  say 

a.  Accept  p  =  .  5  if  x  lies  between  142  and  177. 

b.  Reject  p  =  .  5  otherwise. 

In  summary  when  the  data  present  enough  evidence  to  reject  the 
hypothesis,  the  probability  a  of  an  incorrect  judgment  is  known  in  advance 
since  a  is  used  in  locating  the  rejection  region.  On  the  other  hand,  if 
the  data  present  insufficient  evidence  to  reject  the  hypothesis,  we  are  not 
sure  what  to  do.  We  should  specify  a  practical  significant  alternative  and 
calculate  (3.  In  addition  if  the  size  of  a  sample  is  involved,  we  should  pick 
it  so  p  is  small.  But  in  many  practical  problems  the  calculation  of  P  may 
be  difficult,  if  not  impossible.  So  more  often  it  is  better  not  to  reject 
rather  than  accept  and  then  to  estimate  using  a  confidence  interval. 
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V.  POINT  ESTIMATION 


An  estimate  of  a  population  parameter  by  a  single  number  is  called 
a  point  estimator.  Historically  the  estimation  problems  concerned  them¬ 
selves  with  estimating  parameters.  One  assumed  that  the  distribution 
of  probability  over  the  base  population  is  one  of  a  family  of  distributions 
indexed  by  one  or  more  real -valued  parameters.  Then  estimates  of  the 
parameters  were  made  on  the  basis  of  experimental  observations. 

Suppose  (xx ,  x2 ,  •  •  xn)  is  a  random  sample  from  a  distribution 
which  is  characterized  by  an  unknown  parameter  9.  Now  9  could  be  the 
mean.  What  we  try  to  do  in  point  estimation  is  to  develop  a  function  of 
the  sample  (xx ,  x2,  •*’,  xn)  which  will  have  a  distribution  that  will  cluster 
about  9.  More  precisely  a  point  estimator  for  9  is  a  real  single -valued 
function  of  (xx ,  x2,  •••,  xn),  say  t(xx ,  x2,  •••,  xn)  whose  distribution 
"clusters  in  some  sense"  around  9.  This  t-function  is  itself  a  random 
variable.  Graphically  we  like  to  get  a  distribution  as  shown  in  Figure  16. 

f(t) 

9  t(xx ,  x2  ,  •  •  • ,  xn) 

Figure  16 

Our  job  is  to  try  to  define  the  phrase  "  in  some  sense, "  i.  e. ,  to 
qualify  it.  The  present  jargon  used  in  doing  this  begins  with  two  statements: 
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t  isn1 1  offset  ■*— *  unbias  , 
t  is  as  narrow  as  possible  ■—*  efficient  . 

In  addition  there  are  other  qualifications  we  will  discuss,  namely  being 
consistent  and  sufficient. 

A.  Unbiasedness. 

Suppose  (xi ,  x2,  ♦•*,  xn)  is  a  random  sample  from  a  distribution 
f(x)  and  suppose  that  there  is  a  parameter  0  which  (partially)  describes 
f(x).  Let  t(xi ,  x2,  '  * - ,  xn)  be  a  random  variable  such  that 
E(t(xlf  x2,  xn))  =  e 

where  the  expectation  is  taken  over  all  possible  random  samples.  Then 
t(xj ,  x2,  ••*,  xn)  is  called  an  unbiased  estimator  for  0.  Precisely, 
the  average  value  of  t  is  0. 

1.  Illustration.  Suppose  (xj ,  x2,  •  ••»  xn)  is  a  random  sample  from 
a  distribution  f(x)  whose  mean  is  0,  i.  e., 

E(x)  =  0. 

Let  t(xj ,  x2,  ***,  xn)  =  2-(xi  +  x2  +  *  •  *  +  xn)  =  x.  Then  x  is  an  unbiased 

n 

estimator  for  0,  as  you  already  know,  since 

E(t)  =  E(x)  =  Ej'^L  +  -L  +  •  •  •  +  — ] 

\  n  n  n  I 


(~ 

+  E(*Lj 

+  •••  +  e(5i 

\  n 

l  n  i 

n 

=  Ie(Xi)  +  !e(x2)  +  •••  +  -E(xn) 
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=  le  +  le  +  •••  +  I© 

n  n  n 

=  0,  free  of  n„ 

2.  Illustration.  Consider  a  sequence  of  Bernoulli  trials  and  the 

resulting  binomial  distribution  of  probability  for  the  occurrence  of  the 

event  of  interest  x  times,  i.  e.,  on  x  number  of  trials.  Then  if  p  is  the 

relative  frequency  sample  estimate  of  p,  we  know  from  E(x)  =  np  that 

E(p)  =  E|££|  =  —  E(x)  =  =  p>  free  of  n. 

\ni  n  n 

Therefore  p  is  an  unbiased  estimate  for  p. 

3.  Illustration.  Suppose  we  again  have  the  situation  in  the  previous 
illustration,  only  now  we  are  interested  in  estimating  the  ratio  p/(l  -  p). 
This  ratio  is  often  desired  where  the  ratio  of  the  proportions  of  two  things 
is  of  interest.  Suppose  we  consider  samples  (xj ,  x2 )  of  size  2.  Then  the 
binomial  variable  which  counts  occurrences  of  the  event  of  interest  can  be 
either  0,  1,  or  2.  Let  q=  1  -  p.  Suppose  our  estimator  for  p/q  is 

t(xj ,  x2).  Assuming  it  is  symmetric  in  xj  and  x2 ,  we  can  further  assume 
that  t  takes  on  only  three  different  values,  one  for  each  of  the  three  values 
of  the  binomial  variable  x.  Call  these  values  a,  b,  and  c,  respectively. 
They  occur,  as  you  know,  with  probabilities,  q  ,  2pq,  and  p  ,  respectively. 
Then  the  expected  value  of  t  over  all  samples  of  size  2  is 

E(t(Xl,  x2))  =  t{0,  0)  Pr{t(0,  0)}  +  t(0,  1}  Pr{t{0,  1)} 

+  t(l,  0)Pr{t(l,  0}  +  t(l,  l)Pr{t(i,  1)} 

v 

=  aq2  +  2bpq  +  cp2 . 
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But  if  t  is  an  unbiased  estimator  for  p/q,  this  must  equal  p/q,  regardless 
of  the  value  of  p.  Now 

aq2  +  2bpq  +  cp2  £  a  +  2b  4-  c, 
yet  we  can  always  find  a  value  of  p  .close  enough  to  1  so  that 
a  +  2b  +  c  <  p/q. 

Hence  there  is  no  unbiased  estimator  for  p/q  in  general  when  n  =  2.  By 
a  similar  argument  it  can  be  shown  that  the  same  conclusion  is  true  for 
any  other  value  of  n. 

But  this  simply  means  that  we  cannot  find  one  set  of  numbers  {a,  b,  c} 
that  works  for  every  p.  We  still  might  be  able  to  find  a  correct  set  when 
p  is  known.  Practically  this  is  no  help,  however,  since  our  sampling 
problems  are  directed  to  finding  p. 

This  last  illustration  is  not  to  be  regarded  with  toe  much  sorrow. 

For  though  unbiasedness  is  a  desirable  property,  it  is  not  essential. 

An  estimate  that  is  slightly  bias  but  very  closely  clustered  could  be  more 
useful  than  an  unbiased  one  that  is  widely  spread.  Moreover,  as  we  shall 
show,  if  consistency  exists,  we  know  the  bias  disappears  as  the  size  of 
the  sample  increases.  In  the  last  illustration  we  know  when  a  is  large, 
x/n  should  be  near  p  and  (n  -  x)/n  mar  q.  Hence  their  ratio  which 
reduces  to  x/(n  -  x)  should  be  near  p/q.  In  a  later  section  this  can  be 
defended  by  showing  it  is  consistent.  Still  you  will  note  that  this  statistic 
defies  having  its  expected  value  calculated  for  any  fixed  n.  For  when 
n  =  2,  we  get 
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E  _ _ _ 

\2  -  x 


00. 


2-0 


q2  + 


2  -  1 


2pq  + 


2  2 
p  = 


2-2 


We  recall  the  median  of  a  sample  unbiasedly  estimates  the  population 
median.  Also  we  now  have  another  good  reason  for  the  definition  of  standard 
deviation  or  variance  of  a  sample,  with  the  n  -  1  instead  of  n  in  the 
denominator  because 


and  since 


E{s2}  =  E 


j(xi  -x)2\ 

1  n  -  1  j 


i  fn  2 

E<  2(xj  -£) 
n-1  1 


2  (xi  -  x)2  -  ^(xi  -  p.)2  “  n(x  ~  p) 
1  1 


we  have 


E 


Sfrl  -  p) 


-  E  {n(x  -  (x)2  } 


=  2 E{(xi  -  p)2  }  -  nE{(x  -  |jl)2  } 
1 

2  (T2 

=  n<r  -  n  X  — 
n 

=  (n  -  1  )cr 2 

E{s2}  "  — ! —  X  (n  -  l)<r2  =  cr2,  free  of  n. 
n-1 


B.  Efficiency. 

Suppose  t(xi ,  x2,  •  •  • ,  xn)  and  t*{xj ,  x2,  •  *  • ,  xn)  are  two  unbiased 

2  z 

estimators  for  the  parameter  9  with  variances  Ct  and  <r^,  respectively. 
Then  the  one  with  the  smaller  variance  is  called  an  efficient  estimator 
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for  0  while  the  other  is  called  an  inefficient  one.  This  is  rather  loose 
talk  and  we  should  use  modifiers  of  a  comparative  nature.  Equivalently, 
but  in  another  way,  we  say  the  efficiency  of  t  relative  to  t*  for  estimating 


_  2  /  2 
at*'°t  * 

So  when  efficiency  is  less  than  1,  the  other  statistic  is  more  efficient. 

When  a  value  of  an  efficient  statistic  is  given,  it  is  called  an  efficient 
estimate. 

The  following  theorem  is  remarkable  and  has  been  known  a  long  time. 

1.  Theorem.  Let  (xi ,  x2,  *  ’  *,  xn)  be  a  random  sample  from  a 
distribution  whose  mean  is  0.  Consider  the  weighted  mean 

=  Cl  Xi  +  C2X2  +  •  •  •  +  CjjXj^ 

where  cj  +  c2  +  *  *  *  +  cn  =  1.  Then  xw  is  an  unbiased  estimator  for  0 

and  the  variance  of  it  attains  its  minimum  value  when  ci  =  c2  =  *  •  •  =  cn  =  1  /n. 

a.  Argument.  Consider  n  =  2.  Then  (xi ,  x2 )  is  our  random 
sample  from  a  population  whose  mean  is  0  and  variance  is  cr2  .  Now 

Xv/  =  Cl  Xi  +  C2  X2  ,  Cl  +  c2  =  1 


2  .  2  v_2  _  ,  2  ^  2 


<rl  =  ci  cr2  +  c2  cr  =  (c/  +  c2  )cr  =  k  <  cr  . 
xw 


Geometrically  we  are  considering  only  points  on  the  line  Ci  +  c2  =  1  and 

2  2  2 

also  on  the  circle  ci  +  c2  =  (k/<r)  as  seen  in  Figure  17. 

In  order  to  get  the  smallest  value  of  k/tr  and  still  get  a  point  of 


intersection  with  the  line  ci  +  c2  =  1,  we  want  the  circle  to  just  touch  the 
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7??  . 


line.  Then  the  radius  is  1 /nT 2.  Thus  k/<r  is  1  / n/  2.  At  this  point  ci  “ 
c2  =1/2  which  says  should  have  the  particular  value  x  for  minimum 
variance  which  then  is  cr  / 2. 


.d2  +  c22  =  (k/ or  )2 


Cl  +  c2  =1 


Figure  17 


This  means  if  you  take  a  sample,  you  can*  t  expect  to  do  any  better 
than  to  take  its  mean  to  estimate  the  mean  of  the  base  population  so  far 
as  variance  is  concerned. 

Thus  the  mean  of  the  sample  is  called  the  most  efficient  estimator 
for  the  base  mean. 

The  forecast  of  quarterly  demand  is  a  point  estimator.  Locally  it 
is  developed  usirg  single  exponential  smoothing  and  past  observations  of 
demand.  The  demands  are  weighted,  but  the  weights  decrease  geometrically. 
The  reason  for  this  technique  is  to  give  greater  emphasis  to  most  recently 
experienced  demand.  The  sum  of  the  weights  applied  to  demand  observa¬ 
tions  does  not  equal  one  because  the  last  term  in  the  formula  for  single 
exponential  smoothing  contains  a  previous  forecast  and  it  is  also  weighted. 


■Mg  ,."*  •'  v, 

H&tf?  &%%*>  \%iffr'  *  * 


i  ) 


The  sum  of  all  weighting  factors  does  equal  one,  however.  The  equation 
for  single  exponential  smoothing  can  be  written  as  follows: 


xo  =  cj  xa  +  c2x2  +  •  ♦  •  +  CjjjXjjj  +  (1  -  ci )  xm 


where 


C2  =  Cl  (1  -  Cl  ) 
c3  =  Cl  (1  -  Cl)2 


=  Cl  (1  -  Cl  )* 


Cl  =  Cl  (1  -  Cl  ) 

and  ci  is  always  a  positive  fraction. 


The  variance  of  x0  is  determined  as  follows: 


Xo  =  ci  xi  +  ci  (1  -  ci  )x2  +  •  •  •  +  Ci  (1  -  Ci  )  ~  xm 


/.  ,m_ 

+  (1  -  cx )  xm 


2  2  2  2  .2  2  2  .2(m-l)  2 

°-x0  =  C1  ^xi  +  Cl  (1  -  Cl  >  <rX2  +  •  •  •  +  Cl  (1  -  Cl )  crXm 

.  2m  2 
+  (1  - Ci  1 

=  [  Cl2  +  Cl2  (1  -  Cl  )2  +  •  •  •  +  Cl2  (1  -  Cl  l2*™-1)]  (r^ 

,  #-  .2m  2 

+  (!  -  ci  )  cr-m 


L  1  -  (1  -  dV*  J  x  xr 


SL_[1  -U  -  Cl)2”]*’  -Ml  -c,)2mci 


2  -  ci 


CO 


Cl  * 

- - —  cr  as  m 

2  -  cj  x 


but 


2 

‘•’"n-period  average  ~  • 

so  single  exponential  smoothing  is  as  efficient 
where  n  =  (2  -  ci)/ci.  Graphically: 


2 


n 

as  an  n-period 


average 


2 


Cl 


Figure  1 8 


Depending  on  the  efficiency  of  xm,  the  forecast  developed  by  the 
smoothing  technique  can  theoretically  be  no  better  than  an  n-period  mean. 

Of  course,  this  conclusion  assumes  the  demand  distribution  does  not  change. 

2.  Illustration  1.  Let  {xi ,  x2,  **',  xn)  be  a  random  sample  from 
N(u,  cr2)  and  let  x  be  the  mean  and  x  be  the  median.  You  know  that 


ct-1  =  cr  /n.  Now  we  can  show  that 

A  X 


o-i  =  tt  °x  +  O 


n 


n 


3/2 
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Since  x  is  the  best  estimator  for  p  in  any  distribution,  the  efficiency  of 
x  relative  to  x  for  estimating  p  is 

I*  »  i  »  .  64. 

f.2/n  - 

This  means  a  sample  of  size  64  is  just  as  good  when  taking  the  arithmetic 
mean  as  is  one  of  size  100  when  using  the  median. 

3.  Illustration  2.  For  a  unit  standard  normal  distribution,  N(C,  1), 
we  find  the  average  deviation  from  the  mean  is  <s/  2 /-nr  =  0.  79788  =  .  8. 
Further  we  find  in  sampling  from  such  a  distribution,  the  Mean  Absolute 
Deviation  (MAD)  of  the  sample  is  an  unbiased  estimator  for  the  base  popu¬ 
lation  MAD.  Hence  we  have 

E(l.  25 MAD)  =  cr 

which  explains  our  correction  to  the  "  PROGRAM  61"  calculation  of  MAD 
to  estimate  cr . 

How  efficient  then  is  the  MAD?  Well,  the  variance  of  1. 25MAD 
from  samples  of  size  n  is 

^(l.  25 MAD)  =  varfVfEhi^- 

-  JL_  2  var  ]xi  -  p  | 

2n2 

_  _JL  X  n  X  var  |  x  -  p  | 

2n2 

=  —  {E  |x  -  pj2  -  [E|x  -  p|]2} 

2n 
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»  —  {o-2  -  (T2  X  1 }  =  *  -  2  0-2 
2n  it  2n 

whereas  the  variance  of  the  standard  deviation  s'  from  samples  of  size  n 
(corrected  from  s  so  that  E(s')  =  or)  is 


2n 


+  O 


n‘ 


Therefore  the  efficiency  of  1. 25MAD  relative  to  s'  is 


Eff  = 


2n 


+ 


0/-L| 


n 


=  .8760. 


IT 


2n 


2«r2 


TT 


In  practice  one  usually  does  not  have  p  and  resorts  to  using  x  in 
its  place.  In  such  cases  our  formula  for  cr  in  terms  of  the  MAD  must 
be  corrected  to 


,/j.y  -b-;..*-— 

V  2  \T  n  n/  n  -  1 

so  that  it  is  unbiased.  A  similar  correction  is  needed  for  s.  Remember 
we  only  compare  for  relative  efficiency  the  variance  of  two  statistics, 
when  each  is  an  unbiased  estimator  of  the  same  parameter.  Going  back 
to  the  correction  for  the  sample  MAD  estimate  of  <r,  we  see  from 


1 

N/n(n  -  1) 


i  J_  +  _L  +  _LL 

n  2n2  8n3  48n4 


that 


1  l 

n(n  -  1 )  & 

gives  .  0054  for  n  =  10  and  .  0008  for  n  =  25.  Hence  for  all  practical 
purposes  this  correction  is  never  of  importance. 


84 


4 


1 

I 


I 


jfe 

1 

;.t* 

¥ 

i£ 


5$ 


**©»*^**-5**«’t'-  ' 


Fisher  remarked  on  these  two  estimators:  "As  n  is  made  larger, 
therefore  the  standard  error  of  1 . 25MAD  tends  to  bear  a  constant  ratio 
to  that  of  s.  The  former  is  the  larger  in  the  ratio  Vtt  -  2 :  in  other  words, 
the  value  of  the  standard  deviation  obtained  from  s2  of  a  sample  has  greater 
weight  by  14%  than  that  obtained  from  1.  25MAD.  To  obtain  a  result  of 
equal  accuracy  by  the  latter  method,  the  number  of  observations  must 
be  increased  by  14%.  " 

C.  Consistency. 

We  say  t(xi ,  x2,  •  •  * ,  xn)  is  a  consistent  estimator  for  0  if 

lim  Pr  { |  t(xi  ,  x2,  •••,  xn)  -  ©|  <  &}  =  1  foi  any  5  >  0. 
n-* -go 

Fisher  called  this  the  common-sense  criterion  and  stated  it  as  follows: 

When  applied  to  the  whole  population  the  derived  statistic  should  be  equal 
to  the  parameter.  This  means  as  n  gets  bigger  all  the  probability  of 
the  distribution  of  the  statistic  t  lies  in  the  interval  (0  -  6,  0  +  &).  This 
convergence  in  probability  to  a  constant  is  also  convergence  in  distribution. 
From  the  graph  in  Figure  19  we  see  we  could  equivalently  say: 


Figure  19 
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for  an  arbitrary  6  >  0  and  e  >  0,  no  matter  how  small,  we  can  find  an 
n(&,  e)  such  that 

Pr{0  -  6  <  t{xi ,  x2,  *•*,  xn)  <0+6}>l-«  for  n  >  n(&,  «). 

1.  Theorem.  Let  (xj ,  x2,  *  *  *,  xn)  be  a  sample  from  a  distribution 
whose  mean  is  0  and  variance  is  cr  .  Then  x  is  a  consistent  estimator  for  0. 

a.  At gument.  We  know  E(x)  =  0  and  trJl  =  <rx  /n.  In  Tchebycheff*  s 


inequality 


Pr{|x  -  6|  <  X<^}  >  1  -  1  A2 


let  6  =  \orrr.  Then  we  obtain 

X 


Fr  { |x  -  e(  <  6>  >  1  -l/(6/o-_}2  =1  -<r2/n62. 


Hence 


lim  Pr{  Jx  -  0|  <  6}  =  1. 


You  can  see  the  property  of  consistency  is  concerned  with  the  behavior 

of  an  estimator  when  the  number  n  of  elements  in  the  outcome  is  large. 

Actually  we  have  used  the  Law  of  Large  Numbers  on  several  occasions  to 

_  Z  2 

show  that  an  estimate  is  consistent,  for  example,  x  for  p  and  sx  for  <rx. 
Then  again  we  showed  this  for  proportions  with  respect  to  probabilities  in 
the  case  of  the  binomial  distribution.  However  it  is  possible  in  this  case 


to  get  a  strong  conviction  for  it  by  a  more  detailed  examination  such  as 
that  given  in  Appendix  A, 

In  general,  if  t(xi ,  x2,  ••*,  xn)  is  an  unbiased  estimator  for  0  and 
ov  -*  0  as  n  -*  oo,  we  know  the  estimates  more  closely  approach  0  as  n 
increases. 
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D.  Sufficiency. 


In  19*10  Fisher  became  impressed  by  what  he  called  the  character¬ 
istic  of  sufficiency.  Ke  assumed  a  normal  base  distribution  with  standard 
deviation  cr .  Then  he  considered  the  two  common  methods  of  estimating  tr 
or  <r  from  a  sample  (xi ,  x2,  ***,  xn),  namely 

f: I 

ncri  =  Y  2  S(|x  -  x|)  Mean  Error 
n cr2  =  S(x  -  x)  Mean  Square  Error, 

sometimes  called  Peter’ s  formula  and  Bessel’ s  formula,  respectively. 

He  showed  the  ratio  of  the  variances  for  o-x  and  for  <r2  to  be  (it  -  2),  as 
we  discussed  in  a  preceding  section.  Then  he  considered  various  powers 
p  of  the  deviations  and  showed  the  precision  of  the  mean  square  is  a  true 
maximum,  i.  e.,  for  p  =  2,  while  the  variance  is  14%  greater  for  p  =  1 
and  9%  for  p  =  3.  Hence  we  have  still  another  good  reason  for  preferring 
Bessel' s  formula. 

But  even  more  important  he  showed  that  for  a  given  value  of  <r2  the 
distribution  of  <rj  is  independent  of  tr.  So  when  cr2  is  known,  a  value  of 
<ri  can  give  no  additional  information  as  to  the  true  value  of  c r.  The  same 
can  be  said  if  any  other  estimator  is  substituted  for  crj  .  Consequently  the 
whole  of  the  information  concerning  the  base  population  variance  which  a 
sample  provides  is  summed  up  in  the  single  estimate  cr2.  New  the  same 
cannot  be  said  for  o-j  being  taken  first,  since  then  <r2  does  involve  cr . 

This  means  we  could  improve  our  estimate  of  cr  when  we  first  determine 
cr i  by  taking  <r2  . 

0 
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One  must  remember  that  this  unique  superiority  of  <r2  depends 
on  the  normal  curve  hypothesis  for  the  base  distribution.  For  some 
other  curve,  o-!  might  be  the  superior  estimator  for  <r .  As  a  matter  of 
fact  it  is  when  the  base  population  is  of  the  form 


x-m|  hT. 2 


a  double  exponential  curve.  In  this  case  cr^  must  be  altered  to 
ncrj  =  f4~Z  S(  |  x  -  x| ). 

Fisher  suggested  we  calculate  |32,  the  ratio  of  the  fourth  moment 
to  the  square  of  the  second  moment.  If  this  is  near  3,  the  Mean  Square 
Error  should  be  used;  if  this  is  near  6,  perhaps  we  would  be  better  using 
o'!  for  our  estimate  of  <r . 

Later  we  will  see  that  when  this  property  of  sufficiency  exists  for 
an  estimator,  we  will  be  able  in  general  to  find  the  estimator  by  the  Method 
of  Maximum  Likelihood.  Also  such  a  statistic  will  be  most  efficient  if  a 
most  efficient  estimator  exists. 

The  usual  academic  form  in  which  the  criterion  of  sufficiency  is 
presented  leaves  a  lot  to  be  desired  insofar  as  determining  a  sufficient 
estimator.  The  ordinary  definition  requires  you  know  the  statistic  before 
its  sufficiency  can  be  tested.  This  is  why  Fisher  said  he  provided  us  with 
the  Method  of  Maximum  Likelihood- -to  provide  a  statistic  for  which  the 
criterion  of  sufficiency  is  satisfied. 

To  exemplify  this  concept  we  shall  examine  several  situations. 
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1.  Illustration.  Consider  the  mean  of  the  Poisson  distribution 


-m  x 
e  m 

The  parameter  m  may  be  estimated  from  the  mean  x  of  the  observed 
sample.  Now  it  can  be  proved  that  the  distribution  of  nx  is  again  the 
Poisson  series 

-nm,  ^nx 
e  (nm) 


(nx)! 

The  probability  of  drawing  in  order  any  particular  sample  (xi ,  x2. 


xn)  is 


— rim  m 


nx 


Xi !  x2  !  •  •  •  xn! 


and  this  may  be  divided  into  two  factors,  viz. , 


x  _ _[I|XI|I|X\..  |IiXn 

(nx)!  xi  !x2  !  * '  •  Xjj!  In  I  'n<  |nl 

of  which  the  first  factor  represents  the  probability  that  the  actual  total  nx 
should  have  been  scored,  and  the  second  factor  the  probability,  given  this 
total,  that  the  partition  of  it  among  the  n  observations  should  be  that 
actually  observed.  In  the  latter  factor,  m,  the  parameter  sought,  does 
not  appear.  Hence  x  is  a  sufficient  statistic  for  m. 

a.  Definition.  Suppose  a  population  has  a  probability  density 
f(x,  0),  where  0  is  a  parameter.  Let  (xi ,  x2,  xn)  be  a  random 

sample.  If  t(xj ,  x2,  •  •  • ,  xn)  is  a  function  (random  variable  with  its  own 
probability  law)  such  that  the  probability  density  function  of  (xx ,  x2,  ••  *,  xn) 
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for  any  fixed  value  of  t(xx ,  x2 ,  *  *  * ,  xn)  does  not  depend  on  0,  then  t(xi  , 
*2»  •  *  '»  xn)  is  a  sufficient  statistic  and  it  or  some  simple  function  of  it 
will  be  a  sufficient  estimator  for  0. 


This  means  that  if 


g((xi ,  x2 ,  "  * .  xn)  |  t(xi ,  x2 ,  •  *  • ,  xn))  =  t* 


does  not  depend  on  0,  then  t  is  sufficient. 


2.  Illustration.  Let 


f(x;  0)  = 


0e"x6  ,  x  ^  0 


,  x  <  0. 


Take  a  random  sample  (xx ,  x2,  •••,  xn).  It  has  probability  density  function 


Qne-0Sxi 


Let  t(xx ,  x2,  xn)  =  Sxi.  Then 

'  e'V6*  =  9  x  1! 

(n  -  1).'  t11"1 

=  [p(t,  0)]  X  [g((xx,  x2,  xn)|t)]. 

Since  g(xx ,  x2,  *  *  *,  xn|t)  is  the  constant  (n  -  1)!  /tn_1,  we  knew  t  is 
sufficient.  In  this  case  we  can  see  the  geometry  for  small  size  samples,  viz., 
as  in  Figure  20. 

This  idea  of  sufficiency  essentially  says  that  in  the  space  of  samples 
(xii  x2»  * "  •  »  xn)  you  take  a  "  slice"  in  such  a  way  that  there  is  a  fixed 
probability  all  over  this  slice.  Then  any  function  of  sample  values  on  this 
slice  has  nothing  to  do  with  the  parameter.  All  information  about  the 


parameter  is  obtained  by  going  from  one  slice  to  another  and  not  within  a 


I  <"J 


slice.  Incidentally,  unbiasedness  is  not  related  to  this.  In  the  last 
illustration  as  well  as  in  many  others  we  could  take  2xi  or  2xi/n  for 
a  sufficient  statistic.  Usually  a  simple  transformation  makes  it  a  sufficient 
estimator.  Unfortunately,  sufficient  estimators  are  the  exceptions  rather 
than  the  rule.  In  practice  we  have  to  be  content  with  less  satisfactory 
estimators.  However  when  a  sufficient  estimator  exists  and  a  most  effi¬ 
cient  estimator  exists,  we  know  the  sufficient  estimator  is  most  efficient. 


Figure  20 


3.  Illustration.  Let  f(x;  0)  =  GX(1  -  0)*  x,  x  =  0,  1.  Then  for  a 
randcm  sample  (xi ,  x2,  •  •  • ,  xn)  we  have 
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f(xi ,  x2 ,  •  •  * ,  Xn)  =  f(xi ,  9)  X  f(x2 ,  0}  •  •  •  f(xn,  e) 


.  0SX‘(1  -  8)n"2lt! 


=  I  “I  e*(i  -aHxB 

Mt 


=  [p(t,  8)]  X[g((Xl,  x2,  ••',  xn)|2xi  =  t)] 
which  again  shows  t  =  2xi  is  sufficient  for  6. 

So  we  see  that  if  a  statistic  t  is  sufficient  for  0,  it  means  that  the 
conditional  distribution  of  any  other  statistic  y,  given  t  =  t1 ,  does  not 
depend  on  the  parameter  0.  Consequently  when  we  know  t  =  t’ ,  it  is 
impossible  to  use  y  to  make  a  statistical  inference  about  0.  For  example, 
you  cannot  then  use  y  to  find  a  confidence  interval  for  0,  We  might  try 
to  show  that  x  is  a  sufficient  estimator  for  the  mean  p  of  the  distribution 


N(0,  1). 


4.  Illustration.  Let 


.  (x-9)2 


Then 


f(x;  0)  =  — ~ r  e 
N  2tt 


f(xi ,  x2 ,  *  *  * ,  xn;  0)  =  f(xj;  0)f(x2  ;  0)  •  •  •  f(xn;  0) 


1  r  -2(xi-0)2/2 


Now  if  we  expand  the  numerator  in  the  exponent  on  e,  we  get 
Exf  -  2nx0  +  n02  . 


2  tt  r~  2  ^  ^  2 

Next  if  we  use  the  identity  2xi  =  nx  +  2(xx  -  x)  to  replace  2xi  in  the 
last  expression,  it  becomes 

nx2  +  2(xi  -  x)2  -  2nx0  +  n02 
=  2(xi  -  S)2  +  n£2  .  2nx0  +  n02 
=  n(x  -  0)2  +  S(xi  -  x)2 . 

Therefore 


and  so  x  is  a  sufficient  estimator  for  0. 

E.  Maximum  Likelihood. 

In  1922  Fisher  introduced  his  Method  of  Maximum  Likelihood  to 
provide  a  statistic  that  was  sufficient.  The  likelihood  function  L  is  the 
compound  probability  function  or  density  function  in  the  case  of  a  continuous 
distribution  of  a  specific  observed  sample,  i.e., 

L  =  f(xi ;  0)f(x2 ;  0)  •  •  *  f(xa;  0) 

for  a  sample  (xx ,  x2,  •  •  *,  xn)  from  f(x;  0).  Since  the  logarithm  of  L  is 
maximum  for  the  same  value  of  9  that  maximizes  L  and  since  the  logarithm 

of  a  product  is  easier  to  differentiate,  we  set  ■ 

/  > 
'  / 
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—  log  L 

ae 


equal  to  zero  and  solve  for  0  in  terms  of  our  sample  values.  Note  this  L 
is  not  a  probability  as  it  does  not  obey  the  laws  of  probability  with  respect 
to  0. 

g 

When  —  log  L  is  the  same  function  for  all  samples  yielding  the  same 
estimate  0,  then  a  sufficient  statistic  exists. 

The  condition  that  3L/90  should  be  constant  over  the  same  sets  of 
samples  for  all  values  of  0,  which  has  been  shown  to  establish  the  existence 
of  a  sufficient  estimate  of  0,  thus  requires  that  the  likelihood  is  a  function 
of  0,  which,  apart  from  a  factor  dependent  on  the  sample,  is  of  the  same 
form  for  all  samples  yielding  the  same  estimate  0.  The  sufficiency  of 
sufficient  statistics  may  thus  be  traced  to  the  fact  that  in  such  cases  the 
value  of  0  itself  alone  determines  the  form  of  the  likelihood  as  a  function  of  0. 

1 .  Illustration.  A  sample  (xj ,  x2 ,  '  *  >  xn)  of  n  demands  come  at 
random  from  the  exponential  distribution 

f(x)  =  ke-kx,  0  <  x  <  oo. 

Then 

L  =  kne  "k^Xl  +x2  +  ’  *  '  +xn) 

JL  inL  =  E  -  (xi  +  x2  +  •  •  •  +  xn) 

ak  k 


which  when  set  equal  to  zero  gives 

✓  i  ___ 

k  =  n/(xi  +  x2  +  •  •  •  +  xn)  =  1/x. 

Thus  the  sample  mean  x  is  the  maximum  likelihood  estimator  for  the 
population  mean  1  /k. 
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2.  Illustration.  Suppose  the  random  sample  (x* ,  x2,  •**,  xn) 


comes  from  the  normal  distribution  N{p,  cr  ).  Further  suppose 

a.  <r  is  known  and  p  is  unknown.  Then 


L  =  {_!_!  ~  exp 


(xi  -  p)2 
2or2 


r.  1  , 

in  L  =  -2.  in  2tt  -  nin  (r  S(xi  -  p.)  , 

2  2cr 


0  in  L  1  -r-*  /  \ 

— -  =  —  S(xi  -  p) 

dp.  cr 

which  when  set  equal  to  zero  yields 

P  =  (2xi)/n  =  x. 

b.  p  is  known  and  <r  is  unknown.  Then 

9 in  L  _  _  n  +  2(xi  -  p)2 
9cr  cr  0-3 

which  when  set  equal  to  zero  yields 

(a-)2  =  S(xi  -  p)2  /n. 

c.  p  and  <t  are  both  unknown.  Now  we  must  solve  simultaneously 

f  _  ^  =  -  _  +  — L  s(xi  -  p)2  =  o 

9cr2  2<rz  2cr4 

< 

9  in  h  =  -  J_  2  2(xi  -  p)(-l)  =  0 

V  3p  2crz 

From  the  second  equation  we  get 

p  =  (2xi)/n  =  x. 

Substituting  this  x  for  p  in  the  first  equation  yields 

=  2(xi  -  x)2  /n. 
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Note  the  estimate  p  is  unbiased  but  the  estimate  or2  is  biased. 
However  by  multiplying  by  the  constant  factor  n/(n  -  1)  we  can  make  the 
latter  estimate  unbiased.  Incidentally  the  corresponding  estimate  for  cr 
if*  2b  is  not  biased  since  p  and  not  x  is  subtracted  from  each  xi. 

F.  Relation  of  Maximum  Likelihood  to  Sufficiency. 

For  unbiased  estimators  you  need  consider  only  those  estimates 
based  on  (but  necessarily  equal  to)  a  sufficient  statistic.  The  sufficient 
statistic  may  be  a  biased  estimate,  but  this  is  easily  adjusted  as  you  have 
seen.  The  remarkable  thing  is  that  for  many  problems  there  is  at  most 
one  unbiased  estimator  based  on  a  sufficient  statistic. 

Now  if  a  problem  has  a  sufficient  statistic,  then  the  maximum  like¬ 
lihood  estimator  is  based  on  that  sufficient  statistic.  Before  showing  this, 
let  us  recognize  an  alternate  definition  of  sufficiency  in  the 

1.  Theorem.  A  statistic  t(xj ,  x2,  ••*,  xn)  is  sufficient  for  the 
one -parameter  family  f(x;  0)  if  and  only  if  the  sample  probability  function 
or  probability  density  function  can  be  factored 

x2,  ••*,  xn;  0)  =  p(t;0)Xk(xi,  x2 ,  •••,  xn) 
into  two  parts  (two  distributions  often),  one  dependent  only  on  the  statistic 
and  the  parameter,  the  second  independent  of  the  parameter. 

We  can  state  this  more  generally  for  two  or  more  parameters. 

Though  we  have  already  said  this  "  hunt-and-peck"  system  is  not  desirable 
for  locating  a  sufficient  statistic,  it  is  for  the  moment  to  be  recognized 

that  the  factorization  says  that  the  variation  of  the  probability  with  0  is 

..  .  / 


Sf 
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tied  to  the  statistic  t,  and  that  any  other  variation  is  independent  of  0. 
Now  let' s  use  this  to  show  the 


2.  Theorem.  If  a  sufficient  statistic  exists,  then  the  maximum 

likelihood  estimate  is  based  on  it. 

a.  Argument.  Let  t(xt ,  x2 ,  *•*,  xn)  be  the  sufficient  statistic. 

Then  we  know  by  hypothesis  that 

f(xi ,  x2,  xn;  0)  =  p(t;  e)h(x!,  x2,  •••,  xn). 

The  equation  for  the  maximum  likelihood  estimate  is 

JL[p(t;  0}h(x!,  x2 ,  *•*,  xn)]  =  0 
00 

or 

A  p(t;  0)  =  0 
00 

which,  when  solved  for  the  maximizing  9,  produces  an  estimate  that 
depends  only  on  t. 

G.  Normality  of  M,  L.  E.  for  Large  Samples. 

Before  establishing  the  type  of  the  distribution  of  the  M.  L.  E.  (Maxi¬ 
mum  Likelihood  Estimator),  let  us  calculate  two  expectations. 

Consider  A  fn  f(x;  0).  Note  its  mean  value  is  zero,  viz. , 

00 


E  A  in  f(x;  0 
100 


+C°0 

X  -infix; 


-oo 


00 


0)  X  f(x;  0)dx  • 


+00 
=  X 

-CO 


*  X  Ai2Si_§l  X  f(x;  0)dx 
f(x;  0)  90 
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+00 

=  /  —  f(x;  e)dx 

-00 


_  a 


+  00 


J  f(x;  0)dx 
80  -oo 


=  l_{i)  =  o. 

90 

Next  consider  its  variance.  To  avoid  a  lot  of  symbols,  let  S  stand  for 

JL  in  f(x;  0).  Then 
00 


+oo  r 

<r*  =  E(S2)  =  f 
-oo 


00 


in  f{x;  0) 


f(x;  0)dx. 


This  function  S,  its  mean,  and  its  variance  play  an  important  role  in  our 
work  as  we  shall  see  with  the  variance  of  it  in  the  next  section.  Right 
now  we  further  realize  that  the  sum  ZS(xi;  0),  which  we  set  equal  to  zero 
to  get  the  M.  L.  E.  0,  is  a  sum  of  independent  and  identically  distributed 
random  variables  and  hence  has  a  limiting  normal  distribution  with  0  for 
its  mean  and  nffg  for  its  variance.  So,  for  large  values  of  n,  0  is  close 
to  0  and  there  is  an  approximately  linear  relation  between  SS(xi;  0)  and 
0-0,  in  general. 

Another  way  of  saying  this  is 

E(SS(xi;0))  =  0  and  0-0  =  C[ZS(xi;  0)] 

2 


(ZS)  :  N(0,  norg )  and  0  : 


N 


nog 


Later  we  shall  see  that  0  has  minimum  variance.  Before  that,  let  us 
consider  a  statement  of  great  content. 


;/■'  / 
v 
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H.  Information  (Frechet,  Cramer -Rao)  Inequality. 

Suppose  xj  is  a  sample  of  size  one  from  a  probability  density  function 
f(x;  0).  Let  r(xi)  be  any  unbiased  estimator  of  0.  If 


+00  +00  „ 

—  f  f(x;  0)dx  =  /  rr  f(x;  0)dx  , 

80  otj 

-00  -00 

then 


E^inffce)]2 

To  derive  this  result,  let  us  drop  the  subscript  on  Xj  and  proceed 
as  follows.  By  definition 

J" [  r  (x)  -  0]f(x;  0)dx  =  0. 

Differentiation  of  the  last  expression  gives 


Therefore 


0. 


J[  (r(x)  -  0}  \f f(x ;  0)] 


lluJis  A  ^TffcT) 

80 


dx  =  1. 


Using  Schwarz’ s  Inequality  which  in  general  is 

J* g2  (x) dx  •  / h2  (x) dx  2  [fg(x)  h(x)dx]  , 


J [  r  (x)  -  0] 2  f(x;  0)dx  X 

,r  2  >  1 

r  "  ^raiTf (xToTF  * 


2 

f(x;  0)dx  2;  1 


we  get 


8  in  f(x;  0) 
80 


Therefore  we  know  x  is  the  best  estimator  of  6  in  the  sense  of 
having  minimum  variance. 

Let' s  look  at  a  discrete  case 


Then 


f(x;  0)  =  0X{1  -  0)*  X,  x  =  0,  1. 
in  f(x;  0)  =  x  in  0  +  (1  -  x)  in  (1  -  0), 


JL  in  f  (x;  8)  =  —  -  ... -iJS 
00  0  1  -0 


x  -  0 
0(1  -  0)  ' 


and  so 


n 


£ 

i=i 


00 


in  f(xi;  0) 


1 


0(1  -  0) 


[  Txi  -  0n] 


n 

0(1  -  0) 


Sxj 

n 


Once  again  (2xi)/n  shows  up  to  be  a  good  estimator.  Since  Sxi  is 
the  number  of  ones  or  occurrences,  this  proportion  is  the  best  estimator 
for  0. 


Incidentally  if  we  know  we  have  the  lower  bound  of  variance  for  our 
statistic,  this  theory  can  give  us  a  quick  way  to  calculate  it.  For  in  the 
normal  example 


=  1  _  1 
n(l )  n 


and  in  the  second  example 


_  _l _ 

n[— J - 1 

1.0(1  -  0P 


1 


nEfi  - 
10 


1  -  x]2 

1-0.1 


0(1  -  0) 
n 
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which  is  the  usual  pq/n  form. 


In  conclusion  we  can  say  that  the  minimum -variance  estimate  of  a 
parameter  is  the  unbiased  estimate  that  is  based  on  the  sufficient  statistic 
when  such  exists.  And  the  maximum  likelihood  technique  finds  it!  In  any 
case  the  M.  L.  E.  has  minimum  variance. 

I.  Shortest  Average  Confidence  Limits  for  Large  Samples. 

Though  we  have  been  concentrating  on  point  estimation,  it  is  proper 
to  discuss  this  aspect  of  confidence  interval  estimation  theory  here  since 
our  friend  S(xi;  0)  plays  a  role  in  it. 

It  seems  natural  enough  to  want  our  confidence  interval  for  a  popu¬ 
lation  parameter  as  short  as  possible  in  some  average  sense.  Generally 
this  cannot  be  arranged  except  in  large  samples  for  certain  population 
distributions.  Rather  than  state  for  such  cases  a  general  theorem  whose 
proof  is  more  complicated  than  we  care  to  discuss  in  these  lectures,  let 
us  simply  illustrate  by  taking  the  simple  Bernoulli  distribution 
fB(x;  p)  =  pX(l  -  p)*"X  x  =  0,  1. 

Suppose  we  want  100c%  confidence  limits  for  p  from  a  sample  (yi , 

Yz,  “  ' ,  yn)«  Then 

Yu 

J  fg(x;  p)dx  =  desired  probability  =  f(p). 

Vi 

We  write  the  integral  here  to  be  general  though  our  example  would  call 
for  a  2.  To  get  the  maximum  probability  for  an  interval  is  to  get  the 


smallest  interval  for  a  fixed  probability.  So  let' s  differentiate  our  last 
integral  with  respect  to  p  and  set  the  result  equal  to  zero,  viz. , 


/f<x;  p)dx  =  | ~  f{x;  p)dx  =  0. 

op  J  op 

Since  we  want  an  expectation  (average)  we  want  our  integral  to  have  the 
factor  f(x;  p).  Hence  we  rewrite  it 

Jfp  ^ x;  p^  =  f  lo§  p)]  p)<3x  =  o. 


Another  way  of  introducing  the  importance  of  the  log  here  is  as  follows 

1.  (yi ,  Yz ,  *'*,  yn)  is  a  random  sample, 

2.  f(yx  ;  p)  X  f(y2  ;  p)  X  •  •  •  X  f(yn;  p)  is  its  probability, 

3.  To  maximize  it  (a  minimum  is  obviously  an  end  condition)  we 
set  the  derivative  with  respect  to  p  equal  to  zero,  viz., 

--  [%i  5  p)  X  f(y2  ;  p)  X  •  •  •  X  f(yn;  p))  =  0 

0p 

and  to  get  this  into  expectation  form  we  write  it 


9  logf(yi;  p) 


3P 


[f(yi ;  p)  X  f(y2 ;  p)  x  •  •  •  x  f(yn;  p)]  =  0. 


This  requires  the  parenthetical  sum  to  be  zero. 

Now  to  go  back  to  our  original  plan  and  go  on  from  there  we  need 
the  additional  fact  mentioned  earlier,  namely, 


is  approximately  normally  distributed  with  zero  mean  and  with  variance 
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2  1 

or  =  —  E 
Q  n 


9  log  f(y j ;  p)'2 
9p 


Hence  approximate  100c%  confidence  intervals  are  obtained  by  setting 

Q  _  ±  z 


cr 


Q 


and  solving  for  p.  This  interval  is  smallest. 

Now  for  our  Bernoulli  distribution  we  follow  this  through. 


9  log  f(x;  p)  _  _9_  [  xlog  p  +  {1  -  x)iog  (1  -  p)j 
9p  9p 


1  -  x 


p  1  -  p 


Next 


9  log  f(x;  p)  \2 
9p  * 


=  E 


x 

IP 


1  -  x 


1  -  P. 


1 

•E 


x=0 


',2 


1  -  p 


xl  pxU  -p)1-31 


p(l 


Therefore 


n  r 


Q  -hi 


n 


i=l 


I±  .  i_LXi 

p  1  -  p 


p(i  -  p) 


where 


P  =  (^’/i)/n. 


Since  c r‘ 


J _  X  J_ ,  we  have 


and  the  100c%  confidence  limits  are  obtained  from  solving 


&.-.P-WJL  =  ±  zc 

V p(l  -  p) 

which  is  exactly  the  same  result  as  that  given  on  page  13. 


105 


VI.  ORDER  STATISTICS 


When  one  tests  the  life  of  a  sample  of  n  items,  it  is  obvious  that 
if  ti  denotes  the  time  when  the  i-th  one  fails,  then  the  data  occur  in  a  way 
that  their  serial  order  also  gives  them  in  order  of  increasing  magnitude, 
i.  e.,  ti  £  t2  -  '  *  •  -  tn.  We  say  that  the  sample  values  are  ordered  by 
size  in  time.  However,  not  all  samples'  values  have  this  property  so  we 
consider  rearranging  the  values  in  the  sample  (xx ,  x2,  •  •*,  xn)  in  increasing 
order  of  magnitude  and  then  denoting  this  array  by 
(X|II  ,  X(2),  *  *  *,  X(m). 

Consider  all  samples  of  size  n  from  a  base  population.  Then  the 
smallest  value  in  a  sample  varies  randomly  from  sample  to  sample.  So 
does  the  next-to-smallest,  etc.  Hence  we  have  n  new  random  variables 
each  of  which  is  called  an  order  statistic  as  they  are  functions  of  a  sample. 
We  say  xm  is  the  first  order  statistic  while  x(k;  is  called  the  k-th  order 
statistic.  Now  remember 

Xlil  £  xl2)  S  •  •  •  <  xm) 

and  so  these  new  random  variables  are  not  independently  related.  They 
are  dependent  in  the  strongest  sense,  namely,  pairwise. 

A.  Typical  Order  Statistic  Distribution. 

Let  x  denote  a  random  variable  with  continuous  density  function 
f(x),  -  oo  <  x  <  oo  and  for  n  -  5  let  our  random  sample  be  (xx ,  x2,  *  *  * ,  Xg  ). 
Consider,  say,  the  fourth  order  statistic,  x(4>.  Now  for  a  particular 
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sample,  X{4)  might  be  any  one  of  the  five  serially-ordered  sample  values, 
and,  moreover,  it  can  be  any  value  in  the  domain  of  the  random  variable  x. 
Suppose  we  say  it  is  a  particular  value  and  denote  this  value  by  x’4)  .  Then 
what  is  the  probability  that  the  fourth  order  statistic  will  have  a  value  in 
the  interval 

(x't4i  ,  X(4)  +  Ax(4)  )? 

More  generally,  let  A  be  the  event  that  a  sample  value  lies  in  the 
interval  (■■  00,  x/4)),  B  be  the  event  that  a  sample  value  lies  in  the  interval 
{xj4,  ,  X(4)  +  Ax(4)  ),  and  G  be  the  event  that  a  sample  value  lies  in  the 
interval  (xr4)  +  Axf4>  ,  00). 

Now  we  ask  how’  many  equally  likely  samples  satisfy  the  compound 
event  A  and  A  and  A  and  B  and  C,  whose  probabilities,  suggested  by 
Figure  21, 


Figure  21 


are 

x’4) 

Pr  {A}  =  /  f(x)dx  =  F(x|4}), 

-00 
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dP  i 


X(4)  -i-  Ax 


Pr{B}  =  f  f(x)dx  =  f(x,4)  +  0Ax)XAx,  0  <  0  <  1 


Pr{C}  =  /  f{x)dx  =  1  -  F(xj4)  +  Ax). 


X(4)  +  Ax 


Since  eventually  Ax(4j  =  Ax  will  go  to  zero,  we  may  as  well  assume 


now  that  it  is  small  enough  to  assure  us  that  xt5>  is  greater  than  or  equal 


to  X(4)  +  Ax.  Then  we  can  say  that  for  any  random  sample  (xx ,  x2,  •  •  • ,  Xg ) 


event  A  occurs  three  times  since  three  of  our  five  observations  must  be 


less  than  the  fourth  order  statistic,  event  B  the  fourth  order  statistic*  s 


range  occurs  once  as  does  event  C,  the  fifth  order  statistic*  s  range.  We 


can  indicate  this  by  putting  the  numbers  3,  1,  1  in  the  three  regions  as 


shown  in  Figure  21.  Hence  such  a  typical  sample  would  give  the  compound 


event  of  3 A*  s,  IB,  and  1C  which  in  turn  has  the  probability 


[  Pr {A } ] 3  [  Pr {B } ] 1  [  Pr {C } ] 1  . 


Once  again  we  ask  how  many  equally  likely  serially -ordered  samples 


for  each  fixed  set  of  five  numerically  ordered  values  would  give  this  same 


situation.  Well,  let*  s  first  suppose  x|4,  =  xx ,  i.  e.,  that  the  fourth 


smallest  observation  is  the  first  observation. 


Table  XV  lists  the  various  different  serial-numberings  of  these 


values  which  satisfy  our  requirement. 


So  there  are  four  equally  likely  but  different  serial -numberings  for 


our  set  of  five  values  that  give  xx  to  be  x}4)  .  Similarly  we  would  find 


foui •  equally  likely  but  different  ones  for  each  of  x2 ,  x3  ,  X4  ,  X5  to  be  x’f4} . 
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Therefore  we  would  have  to  multiply  the  probability  of  3 A' s,  IB,  and  1C 
by  20  to  get  the  probability  of  x(44  being  in  the  interval  (x',4, ,  x{4>  +  Ax). 
This  can  be  written 


t 


! 


Pr{x(4)  <  x'(4)  +  Ax}  -  Pr{x'(4>  <  x{4)  } 

|  =  20[F(x{4,  )]3  [f{x[4,  +  0Ax)]Ax[l  -  F(x{4)  +  Ax)]1,  0  <  0  <  1. 

Now  divide  both  sides  of  the  last  equation  by  Ax,  and  then  let  Ax  -*■  0. 
By  definition  the  left  member  becomes  the  density  function  for  x(41  .  If 

i 

we  denote  it  by  g{x(4) )  then  we  have,  dropping  the  prime  on  x'4)  , 

S  i 

j  g(x(4))  =  20[F(x,4)  )]3  [1  -  F{xt4,  )]*  f(x[4) ) 

; 

for  all  values  of  xt4)  for  which  x  is  defined. 

t 

j  The  probability  density  function  just  derived  is  readily  obtainable 

! 

|  as  a  particular  application  of  the  multinomial  distribution.  Just  as  we 

derived  the  binomial  distribution  by  asking  a  question  in  a  Bernoulli  Process 

j 

[Volume  I,  pages  73-75]  we  can  obtain  the  multinomial  distribution  by 
|  asking  a  similar  question  in  a  more  general  process. 


Table  XV 


Less  Than  x]4, 

x(4> 

Greater  Than  x{4) 

(x2,  X3,  X4  } 

Xi 

{X5> 

{x3  ,  X4,  Xj  } 

Xl 

{xz  } 

{x2  ,  Xi  ,  Xg  } 

Xi 

{x3  } 

{x2  ,  x3  ,  x5  } 

Xl 

(x4  } 

B.  The  Generalized  Bernoulli  Process. 

Suppose  we  have  a  process  with  the  following  characteristics: 

On  a  trial  (in  a  sequence  of  trials)  some  one  of  k  different  events 
Ex ,  E2,  •  •  • ,  Ek  occurs; 

The  probability  pi  with  which  Ex  may  occur  remains  fixed  trial 
after  trial.  Note  that  Pi  +  P2  +  *  #  *  +  Pk  =  1 » 

The  trials  are  independent  (i.  e. ,  the  result  on  a  trial  is  not  affected 
by  the  result  on  a  previous  trial). 

1.  Question.  In  n  successive  trials,  what  is  the  probability  of  nx 
occurrences  of  Ex ,  n2  occurrences  of  E2,  •••,  nk  occurrences  of  Ek? 
Note  that  nj  +  n2  +  •  •  •  +  nk  “  n. 

a.  At gument.  Proceeding  as  we  did  in  the  argument  for  the 
binomial  distribution,  we  note  there  is  a  multiple  random  variable  or 
multiple  real-valued  function  in  this  question  and  it  counts  the  number  of 
"  successes"  of  each  event  Ex  in  an  element  of  the  sample  space.  The 
sample  space  consists  of  all  n-tuple  arrays  made  up  of  any  number  of 
each  of  the  Ex,  with  the  total  of  such  numbers  being  rx,  viz.. 


where  ex  denotes  the  rL  series  of  trials.  Therefore  we  shall  call 
X{er;  =  "  the  number  of  Ex  1  s  in  er,  the  number  of  E2  '  s  in  er,  •  •  * ,  the 
number  of  E^'  s  in  er.n  Then  X(er)  can  be  any  set  of  k  integers  in  an 
ordered  array  where  in  each  position  there  can  be  any  integer  from  0  to  n. 

In  order  to  describe  further  the  process  and  the  random  variable, 
we  introduce,  as  before,  the  probability  distribution  of  the  ordered  arrays 
of  numerical  outputs  of  this  random  variable.  By  way  of  illustration  we 
give  in  Table  XVI  the  development  for  the  situation  when  n  =  3  and  k  =  3. 

Incidentally  we  can  generate  by  the  multinomial  theorem  of  algebra 
all  the  various  probabilities  by  expanding 

(Pi  +  Pz  +  Pj)3 

and  we  can  get  each  one  from  the  compact  formula 


f(ni , 


nz)  = 


3! _ 

nj  !  n2  !  n3! 


ni  n?.  n3 
Pi  P2  P  3 


where 

ni  =  0,  1,  2,  3  and 

3 

L  -  3  • 

1=1 

We  write  the  probability  function  as  involving  only  2  of  the  ni  since  only 
2  of  them  are  functionally  independent.  Further  examination  of  the  table 
will  indicate  the  coefficient  on  the  product  of  a  particular  set  of  powers 
nl »  n2,  n3  of  the  probabilities  p! ,  p2 ,  p3,  respectively,  is  simply  the 


number  of  permutations  of  three  things  taken  three  at  a  time  when  r.j 


X(er)  Pr{er} 


{Ei, 

Ej, 

Ei) 

3, 

0, 

0 

{e2. 

e2. 

e2} 

o, 

•J, 

0 

{e3. 

e3. 

e3} 

o. 

0, 

3 

{Ei, 

Ei, 

e2} 

2, 

1, 

0 

{Ei , 

e2. 

Ei} 

2, 

1, 

0 

{E2, 

Ei, 

Ei} 

2, 

1, 

0 

{Ei, 

Ei, 

e3} 

2, 

0, 

1 

{Ei, 

e3. 

Ei} 

2, 

0, 

1 

{e3. 

Ei, 

Ei} 

2’ 

0, 

1 

{e2. 

E:., 

Ei} 

1, 

2, 

0 

{e2  , 

Ei, 

e2} 

1, 

2, 

0 

{Ei, 

e2. 

e2} 

1, 

2, 

0 

{e2. 

e2. 

e3} 

o. 

2, 

1 

{e2. 

Ej, 

e2} 

o. 

2, 

1 

{e3. 

e2. 

e2} 

0, 

2, 

1 

{e3. 

e3. 

Ei} 

1, 

o. 

2 

{e3. 

Ei, 

e3} 

1, 

o» 

2 

{Ei, 

e3? 

e3} 

1, 

o. 

2 

{e3. 

e3. 

e2} 

0, 

1, 

2 

{e3. 

e2. 

Ej  } 

0, 

1, 

2 

{e2. 

e3. 

e3  } 

0, 

1, 

2 

{Ei, 

e2, 

e3} 

1, 

1, 

1 

{Ei, 

e3. 

e2} 

1. 

1, 

1 

{e2. 

Ei, 

e3} 

1, 

1, 

1 

{e2, 

e3. 

Ei  } 

1, 

1, 

1 

{e3. 

Ei, 

e2} 

1, 

1, 

1 

{e3. 

e2. 

Ei} 

1. 

1, 

1 

Pi2  P2 
Pi2Pz 
P2Pz 


Pi2P3 
Pi2  P3 
Pi2P3 


Pi  P 2 
Pi  Pz 
PlPz 


Nr  of  Ex,  E2,  E3 

(«■!,  n2,  n3)  Pr{X(er)  ■  {nx ,  n2,  n3)> 


(3,  0,  0) 
(0,  3,  0) 
(0,  0,  3) 


(2,  1,  0) 


(2,  0,  1) 


(1,  2,  0) 


(0,  2,  1) 


(1,  0,  2) 


(0,  1,  2) 


(1,  1,  1) 


3  Pi  P2 


3p2  pj 


3pi  pi 


3p22P3 


3  Pi  Pi 


3p2P3 


6Pl  P2  Pi 


are  of  one  type,  n2  of  another  type,  and  n3  of  a  third  type  [Volume  I, 
pages  47-50] . 


In  general,  for  n.  trials  our  probability  rules  tell  us  that  a  compound 

event  which  has  n3 ,  "Ex's,"  n2  "E2's, "  ••*,  n^  "  E^'  s"  has  probability 

ni  n2  nk 
Pi  Pz  Pk  • 

Since  there  are  P(n;  nj ,  n2 ,  •  •  • ,  n^)  ways  of  arranging  this  number  of 
"Ex's,”  "  E2  '  s,  "  ••*,  "  E^’ s,  "  we  conclude 

Pr{X(er)  =  fa*  ,  n2,  •  •  * ,  nk)}  =  P(n;  nx ,  n2,  •  •  *  ,  n^) 


nl  n2 

•  Pi  Pz  *  ‘  * 


nk 


Pk 


where  each  of  nx  can  be  any  one  of  the  values  from  0,  1,  •  •  •  ,  n  with 
nx  +  n2  +  *  •  *  +  n^  =  n. 


C.  Application  of  the  Multinomial  Distribution  to  the  Derivation  of  the 
Distribution  Function  of  an  Order  Statistic. 

Let  us  now  reconsider  the  Section  A  and  the  obtaining  of  the  probability 
density  function  of  x(4)  from  a  sample  of  size  n  =  5.  In  picking  x  five 
times  at  random  from  a  population  with  p.  d.  f.  f(x),  we  want  the  selection 
to  be  such  that 


Event 

Description  of  Event 

Probability  of  Event 

Nr  of  Occurrences 

Ex 

X€(-00,  X(4)) 

E(x<4  )  ) 

nx  =  3 

Ez 

x«(x(4)  ,  x(4)  +  Ax) 

f{x<4)  )dX(4  j 

n2  =  1 

E3 

xe(x(4)  ,  «) 

1  -  F(x(4>  ) 

n3  =  1 

Substituting  the  corresponding  probabilities  in  the  multinomial 
probability  distribution  function  we  have 
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jrfTfT  [F(x<4)  )]3  [f(x(4>  )dxfM  ] 1  [  1  -  F(xf4)  )]1 

which  gives  as  the  coefficient  of  dx(4j  the  probability  density  of  the 
fourth  smallest  or  second  largest  observation  as  we  found  before  in  the 
first  section  of  this  chapter. 

D.  Derivation  of  the  General  Order  Statistic. 

Consider  the  k-th  order  statistic  from  a  random  sample  of  size  n 
from  a  population  whose  probability  density  is  f(x).  Just  as  we  did  in 
the  previous  section  for  the  fourth  order  statistic  from  a  sample  of  size  5, 
we  invoke  the  multinomial  theorem  and  probability  function  for  the  three 
events  as  shown  in  Figure  22. 


Figure  22 

The  areas  of  the  three  regions  into  which  different  numbers  of 
observations  of  our  size  n  sample  fall  are  the  three  probabilities  for  the 
events  E3 ,  E2,  E3  as  shown.  Then  event  Ej  occurs  (k  -  1)  times,  E2 
once,  and  E3  (n  -  k)  times. 

So  it  follows  that  if  g\x^))  is  the  probability  density  function  for  x^), 

that 
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g(x(k))dx(k)  = 


n: 


,  k-1 


(k  -  1 )’  (1)!  (n  -  k)! 


;  CF(x(k))]  Lf(x(k))dx(k)] 


•[1  -  F(x(k))]n"k 
or 


g<x(k)> 


r(n  +  1) 

r(k)r(n  -  k  +  1) 


[F(x(k))]k-1[1 


-  F<x(k))]n"kf(x(k)). 


It  is  interesting  and  useful  to  note  that  you  can  always  write  the 
cumulative  distribution  function  G(x(k))  of  a  single  order  statistic  as  an 
incomplete  Beta  function  in  terms  of  the  cumulative  function  F(x)  of  the 
random  variable.  You  will  recall  that  the  Beta  function  p(m,  n)  is  defined 
in  terms  of  the  Gamma  function  by 


P(m,n)  = 

r(m  +  n) 


X ym_i(l  -Y)n~ldy 


Now  if  you  let  y  =  F(x),  then  we  can  writs 


G(x<k))  =  p(k,  n  -  k  +  1) 


r  k-1,.  .n-k 

j  y  (i  -  y)  ay 


where 

y'  -  F<x(k))  • 

By  letting  k  =  1  we  have  the  distribution  function  of  the  smallest 
element  in  the  sample  and  when  k  =  n  we  obtain  that  for  the  largest.  These 
are  sometimes  referred  to  as  Extreme  Value  Statistics. 

1.  Illustration.  Consider  the  sample  (xi ,  x2 ,  •’"»  x5 )  from 


f(x)  =  1,  0  <  x  <  1.  Then 


*fl>  \°  x  /  1 

X  ldx  (ldx(l  x  )  f  ldx 
0  \xU) 

This  reduces  to 

g(xa>)  =  5(1  -x(1>)4,  0  <  xtn  <  1  . 

Note  that  this  distribution  has  a  very  high  ordinate  at  x{ij  =  0  and  drops 
off  rapidly  as  x,u  increases,  reaching  0  when  Xj3)  =  1,  This  is  what 
you  would  get  for  a  frequency  distribution  of  the  values  of  the  smallest 
observation  in  repeated  samples  of  size  five. 

On  the  other  hand  we  get  for  the  median,  x<3x, 

h(xj3j  )  =  30x,23>  (1  -  x(3>)^  0  <  x(3J  S  1 

which  is  symmetric  about  xl3)  =1/2  and  has  its  highest  value  there, 
dropping  off  to  zero  as  x<3)  goes  to  zero  or  unity.  These  two  order  statistic 
distributions  are  shown  in  Figure  2j. 


Figure  23 
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One  would  expect  the  smallest  value  to  have  greatest  chances  of 
being  small  while  the  median  value  would  have  little  chance  of  being  very 
small  or  very  large.  On  the  other  hand  the  median  value  in  repeated 
samples  ought  to  be  more  frequently  near  the  median  of  the  population. 


E.  Maximum  and  Minimum  Order  Statistics. 


The  probability  element  of  the  minimum  order  statistic,  xu^from 
a  random  sample  of  size  n  from  f(x)  is 

n-1 


n 


oo 

f  f{t)dt 
x(n 


f(x(I  >  )dx(1> 


and  of  the  maximum  order  statistic,  x(n),  is 

n-1 


n 


x(n) 

f  f(t)dt 

-oo 


f(x(n])dx(n) 


Next  suppose  f(x)  is  uniformly  distributed,  0  £  x  £  1.  Then  we  find 

n-1 

g\X(  i )  )  =  n(l  -  xa  y  )  ,  0  £  Xu  )  -  1 . 

Therefore  the  integral  of  g(x(jj  )  from  x  to  1  is  (1  -  x)n  which  is  the 
Pr{x(i)  >  x}.  Therefore  the  cumulative  distribution  is 

G(x)  =  Pr{xji)  <x}  =  l-(l-x),  0  £  x  £  1. 

This  is  obvious  from  simpler  considerations  since  it  is  the  probability 
that  not  all  n  values  of  the  sample  fall  into  the  interval  (x,  1).  By 
elementary  set  reasoning  we  know  this  event  is  the  complement  of  all 
values  falling  into  the  interval  which  has  the  probability  (1  -  x)  . 

Similarly  the  probability  clement  of  the  n-th  order  statistic  for  the 
uniform  distribution  over  (0,  1)  is 
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n-1 

h(x(n))  =  nx(n)  ,  0  £  x(n)  £  1. 

Integration  from  0  to  x  of  h{x(n))  gives  x11  which  is  the  Pr{x^n)  <  x}, 

the  cumulative  distribution  H(x).  However  we  can  get  this  directly  for 

this  simple  base  population  since  it  is  the  probability  that  all  the  n  values 

of  the  sample  fall  into  the  interval  {0,  x). 

1.  Illustration.  Let  f(x)  =  2x,  0  <  x  <  1,  and  consider  the  random 

sample  (xi ,  x2 ,  *  *  • ,  x5  ).  Then  the  schematic  diagram  pictured  here 

1  1  3 

- ^  _  _ 

I - 1 - H - 1— — | - 1 - 1 

0  x{1)  x(2*  xf3)  x(4,  x(5)  1 

suggests  the  probability  density  function 


g(x 
6  12) 


)  = 


5! 


1!  1!  3! 


X12> 

X 

X(2)+dx 

"  1 

S  f(x)dx 

X 

S  f(x)dx 

X 

J*  f(x)dx 

0 

_x<2> 

_x<2> 

,  0£x(2)£l 


In  a  similar  way  we  find 
h(x{3  •:  ) 


40x{321(1  -  x(2>)3 


60x(53)  (1  -  x,t ,  )2, 


j(xl5»)  =  1  °X,  5  1  » 


0  £  x(3J  £  1 


0£x(5)£  1. 


,  0  £  x(2l  -  1 


F.  Maximum  Likelihood  +  Order  Statistics. 


1.  Illustration.  Suppose  a  random  sample  of  size  n  is  drawn  from 
the  exponential  population  with  density  function 

f(x;  a,  (3)  =  §e  a  £  x  <  oo,  0  <  {3. 


./■ 


118 


I 


Then  the  likelihood  of  the  compound  event  of  the  n  sample  values  is 


all  x  are  greater  than  or  equal  to  a.  Consequently  a  must  be  less  than 
or  equal  to  every  value  in  the  sample.  Therefore  the  greatest  value  which 
a  can  take  on,  consistent  with  the  sample  values,  is  the  least  value  in  the 
sample,  i.  e.,  x(1) .  So  the  maximum  likelihood  estimator  a  for  a  is  the 
first  order  statistic.  ' 

Next,  substituting  X(d  for  a  in  the  second  equation  of  the  last  pair 
of  equations  yields  the  maximum  likelihood  estimator  for  (3, 


P  =  ^ -  • 

x  -  x,n 

2.  Illustration.  For  a  random  sample  of  size  n  from  a  population 
with  uniform  distribution  f(x)  1/0,  0  5  x  ^  0,  we  have 


fnL  =  -nfn0 

3fnL  _  n 
30  ~0 


Obviously  we  get  nowhere  setting  this  last  expression  equal  to  zero 
for  this  demands  0  be  infinite.  Equally  useless  is  to  say  that  L  is  largest 


when  0  is  smallest  and  so  let’ s  take  0  to  be  xm  the  smallest  value  in 
our  sample.  Elementary  considerations,  on  the  other  hand,  tell  us  that 
0  -  S  •••  £  x(n)  ^  0  and  therefore  we  must  have  0  as  big  as  x(n. . 


Consequently,  under  this  constraint,  our  maximum  likelihood  estimator  is 


r*i 


Note  the  Cramer -Rao  Inequality  and  the  theory  depending  on  it  do 
not  hold  here  since  our  parameter  is  a  value  at  the  end  of  the  domain  of 


the  variable  and  hence  not  within  an  interval  to  permit  using  differentiation 
for  relative  minimum -maximum  analysis.  If  one.  did  not  recognize  this 
and  calculated  the  lower  bound  for  variance  of  the  M.  L.  E.  as  given  by 
the  Cramer -Rao  Inequality,  he  would  get 
fnf(x)  =  -fn  0 


9  in  f(x)  i 

2  =  1-  1 

I"  -  ~ 

90  i 

1  0J 

I  02 

=  n  f  JL  X  -  dx  =  JL 
o  e2  0  e2 


Therefore 


var  0  £  0Z  /n 


But  x(n,  is  our  M.  L.E.  and  we  can  find  its  variance  as  follows: 


g(xm,)dx,m  =  n 


n-1 


x( 


<n\ 


r  1  j 

J  a.  dx' 

0  0 


dx 


[n> 


g(xln^  = 


n-1 


S{x(n)} 


rh9-  E{x™) 


n  oz 


n  +  2 


Therefore 


var  (x(  n) )  =  92  X 


(n  +  1  )2  (n  +  2) 


which  is  smaller  than  the  lower  bound  to  the  variance  found  by  using 
theory  when  the  hypothesis  for  it  was  not  satisfied.  So  all  is  well,  if  you 
look  at  all  of  it. 

Incidentally  the  formula 

E{x,n>}  =  fn/<n+  !)]  Q 

is  useful  in  predicting  the  maximum  value  in  an  assumed  rectangular  dis¬ 
tribution  when  you  have  only  a  sample  of  size  n.  You  simply  use  it  in 
reverse  and  say  0  is  (n  +  l)/n  multiplied  by  the  sample  maximum.  To 
say  when  this  is  reliable  and  to  what  extent  requires  more  analysis  than 
we  will  go  into  here.  One  needs  to  calculate  a  probability  statement  about 
the  difference  between  [  (n  +  l)/n]x(n^  and  0  so  as  to  get  some  sort  of 
confidence  interval  for  0  in  terms  of  the  [  (n  +  l)/n]x{11)  from  a  sample. 

For  n  large,  say  n  S  100,  we  know  that  the  area  under  g(x(m)  to 
the  right  of  the  mean,  [n/(n  +  1)]  0,  is  about  .  63  while  in  an  interval  of 
length  0/(n  +  1)  to  the  left  it  is  about  .  23.  Hence  about  85%  of  the  time 
xtm  lies  from  [  (n  -  i)/(n  +  1)]  0  to  9.  So  we  can  say 


Pr r srri  < "irh6  <dh]  =  -86- 


This  can  be  written  as 


■nJn  n  +  1  „  .  _  .  n  n  +  1  „  \  =  sA 

Pri— — T  - x<nt  <  8  <  r - xfn) 

(n  +  1  n  n-ln  J 

So  if  you  call  f  (n  +  l)/n]x,n)  your  estimate  0  of  0,  then  we  can  say  that 

we  are  86%  confident  that  9  lies  in 


f  ■ 
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— —  Q,  _JL_  ej 

i  n  +  1  n  -  1  I  ' 

To  see  how  effective  this  really  can.  be  suppose  we  assume  a  rectangular 
distribution  and  wish  to  estimate  the  upper  bound  or  largest  value  from  a 
sample  of  size  i00.  If  the  largest  value  in  the  sample  is  also  100,  then 
our  0  is  101  and  the  86%  confidence  interval  is  roughly  (100,  103). 

It  must  be  remembered  that  the  previous  example  was  quite  restrictive 
and  that  if  the  base  population  is  other  than  rectangular,  a  new  probability 
function  for  xfn,  needs  to  be  calculated,  a  new  mean,  and  no  doubt  new 
considerations  as  to  just  what  may  be  meant  by  a  maximum  in  the  base 
population,  to  say  nothing  as  to  the  effect  of  the  sample  size  n. 

G.  Confidence  Interval  +  Order  Statistic. 

Suppose  we  assume  the  base  population  distribution  of  the  previous 
illustration  and  then  we  ask  for  the  smallest  sample  size  such  that  we  can 
be  99%  certain  that  x,nj  cuts  off  to  the  left  the  fraction  (3  of  the  population. 
Well,  this  means  we  must  find  a  sample  size  such  that  the  following  proba¬ 
bility  statement  is  true, 

Pr{x(n)  /9  >  (3}  =  .  99. 

This  can  be  evaluated  as 

(30  (39  n-1 

1  “  f  =  1  -  f  m> 

0  0  Qn 

nj|30 
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.99 


=  1  -  pn  = 

Suppose  we  take  (3  =  .  95.  Then  we  have 

Pr{xjn,.‘y  >  .95}  =  1  -  95)n  =  .99 

which  gives 


n 


in  (.  01)  i 
in  (.  95) 


This  means  if  we  take  a  sample  of  size  90,  then  we  can  be  99% 
confident  that  the  largest  value  in  our  sample  chops  off  at  least  95%  of 
the  universe. 

You  might  say  we  have  a  one-sided  confidence  interval  for  0  here 
since  we  can  express  this  as 

pr{0  <  x(n|/.  95}  =  .99  for  n  =  90 
or  (x(n.,  x;n)/.  95)  is  a  99%  confidence  interval  for  0,  the  maximum  value 
of  the  population. 

If  we  take  our  previous  illustration  where  x(n)  =  100,  then  we  have 
99%  confidence  in  0  lying  between  100  and  106  for  about  the  same  sample 
size.  But  if  you  look  at  the  previous  section  you  will  note  that  the  two- 
sided  confidence  interval  there  given  really  is 


(x'n'-rrhn’) 

a  one-sided  interval,  and  gives  us  less  confidence,  85%,  in  a  smaller 
interval,  (100,  103),  consistent  with  our  later  estimate. 
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1.  Illustration.  Our  hypothesis  tells  us  that  a  random  variable  x 


is  distributed  according  to  the  lav/  f(x)  =  x/2,  0  ^  x  ^  2.  We  want  to  test 
this  hypothesis  by  using  for  a  test  variate  the  value  of  the  largest  observa¬ 
tion  of  three  observations  drawn  at  random  from  the  base  population. 

Using  a  one-sided  critical  region  on  the  right  with  a  level  of  significance 
of  .  05,  let  us  determine  whether  the  hypothesis  should  be  accepted  or 
rejected  by  the  experiment  which  yielded  the  three  sample  values  .211, 
1.96,  and  1.52. 

a.  Argument.  The  density  function  of  x,3,  is 

r  2 

3C<3) 

g(X(3l)  =  /  f(x)dx  f(x13)),  0  £  X(3>  £  2 

2.  1 .  Q 

which  is  motivated  by  the  diagram 


Using  the  assumed  form  for  f{x),  we  find 

g(x«J>)  =  x  *£3  =  ~  X(3) ,  0  <  X(3 )  S  2. 

Now  the  largest  value  in  the  experimental  sample  is  1. 96  which  we  must 
use,  viz. , 
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VII.  NONPARAMETRIC  AND  DISTRIBUTION-FREE  TESTS 


The  two  adjectives  in  the  above  title  seem  to  be  used  alternatively 
and  interchangeably  in  the  literature.  We  will  accept  this,  though  argu¬ 
ment  can  be  given  to  distinguish  between  them. 

Most  of  the  tests  we  have  used  were  based  in  some  way  on  the 
assumption  of  normality.  However,  in  practice  we  often  know  nothing 
about  the  parent  population  and  so  we  need  tests  which  do  not  depend  on 
any  assumption  about  the  form  of  its  distribution  function.  Distribution- 
free  tests  are  based  on  order  statistics  or  ordered  samples,  that  is,  we 


suppose  the  sample  is  ordered  so  that  the  observed  data  are  arranged  in 
increasing  order  of  magnitude.  In  contrast  to  the  common  measures  of 
location  md  dispersion,  i.  e. ,  the  mean  and  standard  deviation  with  which 
we  concern  ourselves  in  parametric  testing,  here  we  use  the  median, 
quartiles,  quantiles,  etc.,  since  they  are  sensitive  to  order  by  magnitude 
while  the  mean  and  standard  deviation  are  not.  In  particular,  when  samples 
are  small,  distribution-free  tests  have  proved  safer  than  parametric  ones 
where  an  error  or  lack  of  precise  information  concerning  the  required 
hypotheses  has  a  rather  dire  consequence. 


A.  Sign  Tests. 

In  earlier  work  we  have  tested,  on  the  basis  of  a  sample,  whether  a 
iistribution  was  "  located"  at  some  prespecified  point.  Low  let’ s  test 


this  nonparametrically.  To  do  so  we  use  the  median  x  of  the  sample  to 
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estimate  the  true  median  p  of  the  base  population.  Suppose  we  wish  to 
test  whether  some  other  number  p0  could  be  p. 

Let  Xj ,  x2,  xn  be  our  sample.  Consider 

Hypothesis:  Median  of  distribution  =  p0 
Alternative:  Median  of  distribution  ?  p0 
To  compute  what  is  needed  from  our  specific  sample,  we  simply  observe 
the  signs  of  the  differences 

Xi  -  Po»  x2  -  tv  “*»  xn  -  Po 

and  record  the  number  of  positive  signs,  y.  Now  Y  =  y  is  a  random 
variable  since  (x* ,  x2,  ••*,  xn)  is  a  random  sample.  Moreover  Y  =  y 
under  this  hypotheses  has  a  binomial  distribution. 

fB(y)  =  Cy/2n,  y  =  0,  1,  ••*,  n 

since  the  probability  of  an  observation  falling  to  the  right  or  left  of  the 
true  median  is  1/2  in  either  case.  We  might  as  well  assume  a  continuous 
probability  distribution  for  the  base  population  X  so  that  values  equalling 
the  median  have  probability  zero  and  hence  can  be  neglected. 

So,  in  our  sample,  we  find  how  probable  is  the  particular  value  of 
Y  and  thereby  make  a  decision  about  the  p0  which  gave  rise  to  it. 

1.  Illustration.  For  the  sample  -emands  853,  857,  861,  851,  856, 
859,  854,  849,  consider  the  hypothesis  that  the  median  of  the  base  popu¬ 
lation  is  85C,  the  alternative  hypothesis  being  it  isn’ t. 

a.  Argument.  Apparently  we  should  use  a  two-sided  test, 
rejecting  the  hypothesis  if  y  is  either  too  large  or  too  small.  The  test 
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statistic*  s  probability  distribution  is 


%(y)  =  Cy/256,  y  =  0,  1,  *,  8. 

which  in  tabular  form  is 


y 

0 

1 

2 

3 

4 

5 

6 

7 

8 

f(y) 

.  004 

.  031 

.109 

.219 

.  274 

.  219 

.109 

.031 

.004 

Fiji 

.  004 

.  035 

.  144 

.363 

.637 

.856 

.965 

.996 

1.000 

Before  we  use  the  particular  value  of  y  =  7  which  our  sample  gives,  we 
note  from  the  above  table  that 

Pr{y  <  1  or  y  >  7}  =  Pr{y  =  0  or  y  =  8} 

=  f(0)  +  f(8)  =  .  008 

Pr  {y  <  2  or  y  >  6}  =  Pr{y  =  0  or  y  =  1  or  y  =  7  or  y  =  8} 

=  f(0)  +  f(l)  +  f(7)  +  f(8)  =  .  070. 

If  we  go  back  to  our  original  derivation  of  the  distribution  function  for  the 
test  statistic  y  and  define  y  to  count  minus  signs  instead  of  plus  signs, 
then  the  same  value  of  p0  would  give  us  for  the  same  sample  set  a  value 
of  8  -  y  for  y.  This  is  why  we  here  use  a  two-sided  test  to  wash  out  the 
effect  of  this  arbitrariness.  In  other  words  the  rarity  due  to  chance  of  a 
,0.  particular  p0  for  candidacy  for  median  must  be  considered  so  as  to 

’  transcend  this  arbitrary  choice  in  the  definition  of  y.  In  our  case  this 

means  we  must  think  of  y  =  1  along  with  y  =  7  as  describing  the  actual 

9 

situation  of  our  data  and  hypothesized  median. 

Now  if  we  decide  to  reject  at  the  1%  level,  then  we  see  that  y  must 
be  0  or  8  to  have  probability  less  than  1%  so  that  we  would  reject  our 


hypothesis.  Hence  in  our  particular  situation  where  y  =  7  our  test  accepts 
the  hypothesis  at  the  1%  rejection  level.  The  same  conslusion  would 
maintain  at  the  5%  rejection  level  since  the  actual  case  on  hand  occurs 
due  to  chance  as  seen  through  the  eyes  of  our  test  statistic  7%  of  the  time 
and  is  not  rarer  than  5%  of  the  time.  We  can’ t  get  a  total  of  5%  of  proba¬ 
bility  from  the  tails  of  our  discrete,  probability  function. 

If  we  lower  our  rejection  le  ’■el  to  7%,  then  we  would  reject  the 
hypothesis.  But  this  is  not  a  very  stringent  requirement  for  rejection. 

In  the  same  vein  of  thought  but  slightly  more  general  lies  the  testing 
of  whether  two  different  samples  {xi}  and  {yi}  come  from  the  same 
population.  If  the  samples  are  fairly  large,  we  can  invoke  the  LaPlace- 
DeMoivre  theorem  as  seen  in  the  following  case. 

2.  Illustration.  Suppose  we  have  two  independent  random  samples 
xl  *  x2 »  ‘  xn  and  Yi ,  y2 ,  •  •  * ,  yn  and  that  we  wish  to  examine  the 
possibility  that  they  came  from  the  same  population  with  a  distribution 
function  which  we  do  not  know. 

a*  Argument.  Now  the  Xi  are  not  only  random  among  them¬ 
selves  but  also,  under  the  assumption  of  a  common  base  population  dis¬ 
tribution  function,  random  among  the  yi.  Hence  the  probability  is  1/2 
that  any  yi  is  less  than  any  Xi. 

Let  us  prove  this  in  general  for  any  two  independent  random  sample 
values  y  and  x.  If  f(x)  is  the  common  density  function  on  (0,  oo),  then 
from  the  joint  density  function  we  get 


00  X 

Pr{y  <  x}  =  X  f(x)dx  /  f(y)dy 
0  0 


since  the  admissible  region  in  the  x  -  y  plane  for  our  event  "y  <  x"  is  as 
shown  in  Figure  24. 


If  we  let  z  =  f  f(y)dy,  then  dz  =  f(x)dx  and  z  =  0  when  x  =  0  while 
0 

z  =  1  when  x  =  oo.  Therefore 

1 

Pr{y  <  x}  =  /  zdz  =  1/2. 

0 

Thus  we  see  our  probability  and  the  event  are  independent  of  f(x)  and 
hence  distribution -free.  So  we  are  on  firm  ground  to  say  for  Zi  =  xi  -  yi 
and  for  Ui  =  1  if  Zi  >  0  and  ui  =  0  if  zi  <  0  that 

Pr{ui  =  1}  =  Pr{ui  =  0}  =  1/2. 

Now  ui  is  a  random  variable  and  consequently  so  is  w  =  Sui.  Its  mean 
and  variance  are  seen  to  be 

E{w}  =  2E{ui}  =  (!+!+••*+!)  =  | 


/•/. 
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var {w }  -  Svar{ui}  =  (I +!+•••+ I)  = 

Therefore  the  standard  deviation  of  w  is  ,/n/2. 

Recall  how  we  proved  the  Central  Limit  Theorem  in  the  previous 
course  on  pages  174-177.  So  if  n  is  large  enough,  say  over  30,  we  are 
pretty  sure  w,  though  discrete,  can  be  adequately  described  by  a  normal 
distribution.  This  allows  us  to  make  the  statement 

■d  I  w  -  n/2  ^  j  • 

pTZe  ^jr  <z‘j  - c 

which  leads  us  to  the  associated  statement 


To  further 
free  work, 
and  find 


Pr  ^  -  z. 


'vTn^,  ^  n  ,  's/n 
— —  <  w  <  —  +  z„  — — - 


=  c. 


exemplify  how  the  large  sample  theory  joins  the  distribution- 
we  take  c  =  .  95,  n  =  100,  n/2  =  50,  *J~n/2  =  5,  zc  =  1. 96, 


Pr{50  -  1. 96(5)  <  w  <  50  +  1. 96(5)}  i  .  95 


or 

Pr {40  <  w  <  60}  =  .95. 

We  would  therefore  reject  the  hypothesis  of  the  same  population 
for  the  two  sets  of  data  each  of  size  100  at  the  5%  critical  level  if  the 
sum  of  the  ui  is  greater  than  60  or  less  than  40.  Basically  this  is  analogous 
to  the  first  sign  test;  Sui  is  our  binomially  distributed  test-variate 
with  p  =  1/2. 


B.  Point  and  Interval  Estimation. 


As  we  said  earlier,  the  base  population  median  is  estimated  by  the 
sample  median  which  is  not  unbiased  but  is  consistent.  Similarly  we 
estimate  the  population  quantiles  by  the  corresponding  sample  quantiles. 

These  are  point  estimators. 

In  contrast,  to  obtain  a  confidence  interval  estimate  for  P,  we  use 
the  equal  probability  concept  for  an  observation  being  to  the  right  or  left 
of  p.  It  then  follows  that  the  probability  that  x,r, ,  the  r-th  order 
statistic,  exceeds  |i  is 

Pr{x(r)  >  P}  =  Pr{xa>  >  p,  i*  1,  2,  •**.  n} 

+  Pr{x(D  <  p;  x(i,  >  p,  i  =  2,  3,  *•  • ,  n} 

+  Pr{xm  <  p,  i  =  1,  2;  x{j)  >  p,  j  =  3,  4,  •  *  • ,  n} 


+  Pr{x(i)  <  P,  i  =  1,  2,  ’  ’  * ,  r  -  1;  x,j)  >  p. 


j  =  r,  •  •  • ,  n} 


xx  /In  ^ 

>,  if  fB(i)  =  ci  (2)  »  1  =  Z>  '  "  >  n*  1:11611 


Pr{xtr)>P}  =  E  fB(i) 

i=0 

Since  p  =  1/2  and  hence  our  binomial  distribution  is  symmetric,  we  also 


!  Further  we  find 

n  n 

Pr{x,Si  <  ?}  =  £  %(i)  =  Z  Ci 

i=s  i=  s 

i 

and  so 

;■•  s-l  s  - 1 

Pr{x(  <  (I  <  x(s  }  =  £  %(i)  =  £  C^/2n,  r  <  s. 

i=r  i=r  1 

Thus  (x(r,  ,  x(SJ )  is  a  confidence  interval  for  jl  and  the  amount  of 

|  confidence  is  the  value  of  the  sum  of  the  probabilities  in  the  right  side  of 

the  last  equation.  These  sums  can  be  computed  directly  or  by  use  of 

the  tables  of  the  Incomplete  Beta  function,  e.  g.  , 

n 

1  -  Fb{x)  =  CtVqn  1  ~  Ip(x+  1,  n  -  x) 

j  t=x+l 

« 

S 

* 

P  1 

^  x  .n-x-1 , 

f  y  (i  -  y)  dy 

=  o 

l  . 

X  .n-x-1 

/  y  (i  -  y)  dy 

0 

1.  Illustration.  For  a  sample  of  size  6,  we  find 

a.  Pr{xtl)  <  p  <  x(6,}  -  i2  =  .  97. 

64 

b.  Pr{x(2>  <  ?  <  x{5,}  =  50  =  .78. 

64 

2.  ^lustration.  Suppose  Qj  is  the  lower  quartile  or  .  25  quantile. 


Then  the  probability  of  a  random  value  being  to  the  left  of  it  is  1/4,  to 
the  right  3/4.  Hence  for  a  sample  of  size  6 


Pr{xU)  <  Qi  <  x{4,  }  =  Pr{x(1)  <  Qi  <  x,2)} 

+  pr{x,2,  <  Qx  <  x,3>  } 
+  pr{xt3)  <  Qi  <  xf4,  > 


1458  +  1215  +  540 
4096 


321 3  =  .78 
4096 


We  note  since  the  above  sum  is  of  three  consecutive  terms  of  the  binomial 
distribution 


V*>  ■  cili|  111 


n~i 


that  this  probability  (confidence)  could  have  been  obtained  from  the  Incomplete 
Beta  as 


I1/4  (1»  6)  -  I1/4(4,  3). 

We  will  talk  about  quantiles  in  general  shortly. 


C.  Tolerance  Limits. 

You  will  recall  we  spoke  of  these  earlier  in  Chapter  III  when  we 
were  estimating  what  size  of  spread  would  contain  a  certain  percentage 
of  the  base  population  whose  form  of  distribution  was  known.  At  that 


time  we  said  we  would  return  to  the  same  concept  when  we  no  longer  knew 
the  base  population  distribution  function.  Let  us  begin  by  studying  an  example. 


1.  Illustration,  Consider  a  random  sample  (xj ,  x2 ,  x3  ,  X4  )  and 
the  random  interval  [xtu  ,  Xf4>  ] .  What  proportion  of  future  sampled 
items  will  fall  in  this  region? 

a.  Argument.  First  we  must  recognize  we  can  give  only  a 
qualified  answer.  That  is,  the  proportion  P  we  get  will  depend  on  the 
desired  confidence  we  wish  to  put  into  it  and  vice  versa.  Now  let  us  ask 
for  95%  coverage.  Then 

1 

Pr{P([x(1>,  x(4>])  £  .95}  =  4!  J*  x2  (1  -x)*dx 

ZA‘  .95 

=  1  -  I,95  (3,  2)  =  .015  . 

Thus  there  is  only  a  small  chance  that  the  interval  [x(^  ,  x{4^  ]  will  contain 
95%  of  the  probability  of  the  distribution.  Incidentally  we  used  in  the 
above  equation  the  general  formula  for  the  probability  element  of  the 
range  V  of  a  sample  of  size  n,  namely, 

n(n  -  l)yn"2fl  -  V)dV. 

In  general  for  a  coverage  P  ^  p  and  a  confidence  c  we  must  solve 

1  2  . 

Jn(n  -  1)V  (1  -  V)dV  =  1  -  (npn“  -  (n  -  Dp*)  =  c 

P 

where 

FV(P)  =  Pr{V  -  P }  =  1  -  c 

which  are  usually  intractable.  Transcendental  equations  like  this  are 
hard  to  solve,  usually  solved  by  trial  and  error.  In  the  case  when  p  and 


c  are  given  and  we  wish  to  determine  the  smallest  n  for  this  desired 
tolerance  interval,  we  must  solve 

F((3)  =  np11”1  -  (n  -  l)pn  =  1  -  c 
for  n.  An  approximation  is 


For  example,  when  c  =  .  95  and  p  =  .99,  we  get 

n  =  I(9.488)p-^]  +  1  =  473. 

It  is  no  wonder  our  original  sample  of  size  4  gave  us  such  a  small  chance 
of  containing  95%  of  the  probability  of  the  distribution.  As  a  matter  of 
fact  we  need  n  =  132  to  get  99%  confidence  that  [x(xj  ,  x(nJ  will  account 
for  95%  of  the  action. 

This  concept  of  working  with  a  percentage  of  the  probability  and 
not  with  the  same  percentage  of  the  range  was  first  given  by  the  late 
S.  S.  Wilks  in  two  short  classical  papers.  They  mark  at  a  later  date  as 
great  a  contribution  as  the  earlier  confidence  interval  did  for  a  parameter 
of  a  distribution.  It  was  known  by  Wilks  and  others  that  percentage  of 
range  could  not  be  handled. 

D.  Confidence  Intervals  for  Quantiles. 


■C  ■*i':-**r***.*: 


Ip(ki ,  n  +  1  -  ki )  -  Ip(ki  +  k2 ,  n  +  1  -  k:  -  k2 ) 

which  is  the 

Pr{x(ki!  <  xp  <  xlkl+k^  } 
which  in  turn  is  really  the  probability  of 
F<x(k:>)  <  P  <  F(x<k1+k2)>- 


Wilks  tied  up  the  essentials  of  all  this  in  a  very  important  theorem: 

Wilk1  s  General  Theorem.  If  Vr  =  the  sum  of  any  r  coverages  of 
U' s  where  Ui  =  F(x<d)  -  F(x<.i-i>  ),  then  the  probability  element  of  Vr  is 

(n  -  r)f„  -  1)!  Vrr_l(1  -  Vr)n'rdVr'  °  <  V'  "  1 

which  is  the  Beta  distribution  for  r  and  n  -  r  +  1.  The  corollary  of 
Wilk’ s  General  Theorem  is  also  very  useful  and  may  be  stated  as: 

Corollary.  The  average  amount  of  probability  for  any  one  coverage 
is,  taking  r  =  1, 


E{Vj  } 


P(n,  1)  o 


fVi  (1 


V1)n'1dV1  =  2> 

P(n,  1) 


1  . 
n  +  1 


It  is  no  wonder  some  people  say  that  confidence  intervals  on  quantiles 
are  equivalent  to  tolerance  statements  about  the  population  with  the  same 
confidence. 


E.  Probability  Paper  Again. 

The  corollary  just  given  by  rights  ought  to  be  stated  as: 

Theorem.  For  any  continuous  distribution  the  expected  values  of 
the  n  +  1  probability  areas  determined  by  the  random  sample  of  n 
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values  are  all  equal  to  each  other  and  so  their  common  value  is  1 /(n  +  1). 

Remember  how  in  Volume  I  on  pages  108-111  we  used  arithmetic 
probability  paper  to  check  on  normality  in  large  samples.  Now  suppose 
we  have  a  small  sample,  say  size  10.  Then  as  we  promised  on  page  109 
in  Volume  I,  we  would  use  order  statistics  plus  the  above  theorem  in 
order  to  give  us  a  similar  check  with  the  same  graph  paper. 

The  practical  value  of  this  lies  in  the  fact  that  for  any  random 
sample  of  size  n,  the  total  expected  probability  area  to  the  left  of  the 
i-th  order  statistic  is  equal  to  i/(n  +1).  Now  if  we  took  the  points 


1  /  2  \  I  n\ 

XU>  »  n  »  xt2>  >  —  j>  ***»  [x(n\>  nj 


we  could  not  plot  the  last  one  as  it  does  net  appear  on  our  paper.  The 
symmetry  of  the  normal  distribution  suggests  that  whatever  probability 
be  assigned  to  x<i)  ,  then  ore  minus  it  should  be  assigned  to  x^n)  .  We 
could  use  the  ''  spacings" 


X(1*  ’  n+  l)'  (X,2lJ  n+  J*  lXfn>’  n+  l) 


or  the  "  spacings" 


lXm’  fa2*'  't)’  iX(n'’  I 


Much  depends  on  your  purpose  in  plotting.  Ij.  you  wish  to  obtain 
"  optim^om"  estimates  of  the  base  population  mean  and  standard  deviation, 
you  will  find  the  literature  replete  with  intricate  analysis  for  each  sample 


size.  The  last  "  spacings"  given  above  are  called  by  many  authors 


r 

i 

I 

j  "intuitively  plausible"  and  are  found  to  be  nearly  as  efficient  as  the 

optimum  probability  "  spacings.  "  Moreover  they  follow  a  simple  formula. 

So  we  will  use  this  spacing  for  plotting  the  order  statistic  cumulative 
probabilities  on  the  "linearized"  probability  scale  against  the  observed 
values  of  the  sample  which  are  measured  on  the  arithmetic  scale. 

1.  Illustration.  The  following  ten  demands  were  obtained  randomly 
i  and  then  reordered:  162,  191,  198,  212,  220,  232,  240,  252,  265,  286. 

The  corresponding  cumulative  probabilities  for  the  associated  order 
i  statistics'  values,  using  (2i  -  i)/2n  are 

5 

I  1/20,  3/20,  5/20,  •  *  • ,  17/20,  19/20,  respectively. 

On  page  141  we  see  the  plot  or  graph  of  these  on  arithmetic  probability 
paper.  They  seem  to  lie  near  the  straight  line  we  drew  in  by  eye  so  we 
accept  normality  of  the  base  distribution.  As  before  in  large  samples, 
we  now  estimate  the  mean  by  the  50th  percentile  which  is  224  while  262 
at  the  84th  percentile  yields  the  estimate  of  38  for  <r ,  the  standard  deviation. 
The  sample  itself  has  a  mean  of  225.  8  and  a  standard  deviation  of  37.  1. 

Note  if  we  took  our  old  formula  for  estimating  cr  using  the  16th  percentile 
value  we  get  (262  -  186)/ 2  =  38. 

The  same  procedure  can  be  used  to  estimate  parameters  for  other 
type  distributions  on  their  own  "  linearized"  probability  paper.  The 
spacings  of  the  associated  probabilities  would  change  accordingly. 


l 
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F.  The  Magnitude  Test. 

Suppose  we  have  two  random  samples,  {xi}  and  {yi},  of  demands 
and  we  wish  to  decide  whether  or  not  they  come  from  the  same  parent 
population.  Now  we  have  seen  how  the  sign  test,  when  used  in  such  a 
situation,  considers  only  the  signs  of  the  differences  Zi  =  xi  -  yi  and 
does  not  take  into  account  the  magnitude  of  these  differences.  Consider 
the  following  data  from  two  samples  of  size  6, 


! 

{ 


i 

1 

2  I 

!  3 

4  [  5 

6 

Zi 

3  ! 

4  ! 

-1 

6  1  5 

1 

!  Under  the  assumption  that  the  parent  population  is  the  same,  it 

! 

j  follows  that  the  median  of  Zi  is  zero.  Further  it  follows  that  the  two  xi 

and  yi  that  give  a  value  for  zi  might  just  as  well  have  been  interchanged. 

t 

This  means  any  zi  could  just  as  well  have  been  positive  or  negative.  So 
we  might  consider  drawing  a  random  sample  of  size  6  from  the  synthetic 
population  of  six  possible  pairs  of  differences,  one  drawing  from  each 
pair.  This  means  each  random  sample  uses  a  zi  with  one  sign.  Hence 
our  population  consists  of  t>4  equally  likely  possibilities,  ranging  from  the 
extreme  negative  total  of  -20  when  all  signs  are  negative  to  that  of  +20 
when  all  signs  are  positive.  Our  test  variate  is  the  sum  of  the  six  differences. 
The  following  table  gives  the  frequency  of  occurrence  of  the  various  sums. 


8 

±20 

±18 

±1 6 

±14 

±12 

1  ±10 

±8 

±6 

|±4 

±2 

0 

f(s) 

1 

2 

1 

1 

3 

4 

4 

4 

1  4 

5 

6 

So  we  see  the  peccability  of  s  =  20  is  1/64,  of  s  =  -20  is  1/64,  of  18 
is  2/64,  of  -18  is  2/64,  etc.  Obviously  the  distribution  is  symmetric  about 


s  =  0. 
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Now  suppose  we  use  a  significance  level  of  5%  and  a  one-sided  test. 
Then  the  nearest  we  can  come  to  it  is  by  taking  s  =  18  which  yields 
Pr{s  ^  18}  =  3/64  =  .  047.  The  actual  sample  value  is  s  =18  which  falls 
in  the  upper  5%  critical  region.  Therefore,  we  would  reject  the  null 
hypothesis  which  in  this  case  is  that  zero  is  the  median  which  in  turn 
rejects  the  hypothesis  that  the  two  random  samples  came  from  the  same 
distribution. 

Recall  the  sign  test  takes  no  account  of  the  magnitude.  In  this  last 
example  had  we  invoked  the  sign  test,  then  the  test  variate  would  be  x  = 
the  number  of  positive  signs  and  would  have  had  the  distribution 

fB{*>  =  Cx/26,  x  =  0,  1,  2,  •••,  6 

For  the  same  significance  level  and  right-sided  test  we  find  x  =  6  is  the 
only  value  falling  in  the  critical  region  since  f(6)  =  .  016  while  f (5)  +  f(6)  = 

.  094  +  .016  =  .  11.  Since  x  =  5  in  the  actual  sample  we  would  accept  the 
null  hypothesis  that  the  number  of  positive  signs  equals  the  number  of 
negative  signs  and  hence  that  both  samples  come  from  the  same  distribution. 

In  a  sense  the  magnitude  test  generalizes  the  sign  test  in  that  the 
former  can  be  reduced  to  the  latter  by  taking  all  possible  arrangements 
of  a  fixed  number  of  excesses  and  lumping  the  probabilities  of  their  sums. 

It  is  important  to  note  from  the  previous  discussion  that  for  the 
same  sample(s)  we  have  come  up  with  opposite  decisions  from  two  different 
hypotheses  of  randomness  and  their  test  variates.  The  lesson  to  be  learned 
is  that  we  usually  solve  an  interpretation  of  a  problem  and  not  the  problem 
per  se. 
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G.  Conditional  Events. 

Many  practical  situations  call  for  an  estimate  of  what  to  expect 
next  after  a  sample  has  been  taken.  From  another  point  of  view  we  could 
suggest  certain  possibilities  and  then  calculate  their  chances  of  occurring. 
It  is  this  line  of  thought  along  which  we  will  proceed.  First  let  us  look  at 
1.  Illustration.  Suppose  we  have  had  ten  demands  randomly  given 
from  a  hypothetical  distribution  f(x).  For  convenience  we  will  assume 
f(x)  is  defined  over  0  to  oo.  What  we  do  will  not  be  limited  in  application 
by  this  as  the  same  result  wottid  evolve  for  a  finite  range  of  x.  Now 
suppose  three  more  demands  are  randomly  drawn  from  f(x).  What  is 
the  probability  that  all  three  of  these  demands  will  be  larger  than  any  of 
the  first  sample? 

a.  Ar  gument.  We  know  that 

-»9 


g(x<i  o)  ) 


10! 

9! 


x»o> 

J  f(x)dx 
0 


f(X(l  0)) 


and  the  next  three  must  be  larger  than  X{j  oj  •  The  conditional  probability 
that  this  happens  is 

f  *13 

CO 

/  f(x)dx  . 

-X<10) 

Therefore  the  joint  probability  function  of  these  two  events  is 

3 

,  0  S  x(io»  <  oo 


g(X(lO)) 


oo 

X  f(x)dx 
lx(i  0}  . 


Hence  the  probability  of  this  event  is 


00 

/  g(*U01 


)dx(io'> 


/  f(x)dx 


0 


xUOJ 


00 

x<10l 

9 

00 

=  10  / 

/  f(x)dx 

/  f(x)dx 

0 

0  J 

w 

-xll  0>  - 

f{x(i0))dxjioi  . 


x(10) 

Ifwelet  U  =  J  f(x)dx,  then  dU  =  f{xa  0)  )dxa  0)  *»<*  integral  becomes 
0 

1  3  o 

10/(1  -  U)  U9  dU  =  10(3(4,  10)  =  1/286 
0 


So  we  see  the  probability  of  the  event  of  interest  does  not  depend  on  the 
form  of  f(x). 

2.  Illustration.  For  an  arbitrary  f(x),  0  5  x  <  «,  find  the  proba¬ 
bility  that  after  a  random  sample  of  size  n  is  drawn,  the  next  two 
observations  will  lie  outside  the  range  of  the  sample. 

a.  Argument.  This  means  that  the  (n  +  l)st  and  (n  +  2)nd 
demands  lie  outside  of  Xa  j  S  x  ^  x(n)  .  As  we  did  in  the  previous  illus¬ 
tration,  we  find  the  probability  P  of  this  event  is  given  by 

_n-2 


00  00 
n(n  -  1)  /  f(xu>)dxcu  /  f(x<n) 
0  x(n 


x<n> 
f  f(x)dx 

x!l> 


X 


*<n) 

1  -  /  f(x)dx| 
x(l) 


dx. 


n) 


XH)  xtni 

Transforming  by  U  =  /  f(x)dx,  V  =  /f(x)dx„  we  see  that  U  £  V  <  1, 

0  0 


/ 
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V  -  U  =  /  f  (x)dx 
XU) 


and  so 

1  1  2 

P  =  n{n  -  1)  JdU  f  (V  -  U)n-  [1  -  (V  -  U)]2dV 

0  u 

_  _ 6 _ 

(n  +  1  )(n  +  2) 

free  of  the  form  of  the  distribution  f(x). 

H.  Elementary  Protection  Level  Calculations. 

The  Theorem  on  page  138  can  be  used  directly  for  a  simple  but 

y* 

typical  protection  level  problem.  Suppose  we  have  a  random  sample  of 
size  n  which  is  ranked  from  smallest  to  largest  in  our  usual  notation 
Xfl>  t0  X(ny 

Now  when  we  asked  in  the  previous  section  about  the  probability  of 
two  or  more  additional  random  values  behaving  in  a  way  conditioned  to 
the  original  sample  values,  we  ran  into  some  calculus.  However  if  we 
ask  only  about  the  next  value,  things  are  very  simple.  Suppose  we  ask 
for  the  probability  that  the  next  demand,  call  it  x*,  is  greater  than,  say 
x(r) ,  r  -  n.  Among  the  n  f  1  equally  likely  intervals  created  by  the 
ordered  values  of  the  size  n  sample,  we  are  asking  for  x*  to  fall  into 
any  one  of  n  -  r  +  1  of  them.  Therefore 

Pr{**  >  xm>  =  liifl 
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To  apply  this  suppose  we  want  a  protection  level  of  .  80  for  an  item 
whose  demand  is  known  over  the  past  nine  quarters.  Then  we  want  the 
probability  of  being  out  of  stock  to  be  no  more  than  .  20.  Our  formula  says 
r  S  10-.  20(10)  =  8. 

This  means  that  stocking  up  to  the  level  of  the  eighth  ranked  previous 
demand  reduces  the  probability  of  stockout  to  .  20.  This  used  the  second 
highest  previous  demand. 

To  find  that  minimum  sample  size  n  for  which  x#  need  be  only 
larger  than  the  second  highest  previous  value  at  different  protection  levels 
we  offer  the  following  table 


Protection  Level 

.50 

.60 

.70 

.80 

.85 

.90  | 

.95 

Minimum  n 

3 

4 

6 

9 

12  ! 

19 

39 

Related  tables  varying  one  or  two  of  the  three  variables  can  be 
constructed  to  magnify  this  elementary  concept  of  protection. 

I.  Tests  of  Randomness , 

Since  all  of  our  previous  theory  and  technique  depended  on  the  random 
selection  or  random  occurrence  of  events  or  data,  it  sometimes  is  desirable 
to  test  selected  data  for  this  property.  Actually  the  following  tests  are 
not  capable  of  proving  randomness  exists  if  it  does  exist.  They  at  best 
indicate  to  what  degree  nonrandomness  exists.  It  is  important  to  realize 
that  each  of  these  tests,  even  when  they  detect  no  nonrandomness,  do  not 
assure  randomness.  Some  will  frequently  not  detect  nonrandomness  when 
it  is  present.  However,  these  tests  are  useful  in  avoiding  faulty  conclusions 


because  of  incorrect  assumptions  and  they  can  indicate  the  need  for  investi¬ 
gation  of  factors  systematically  affecting  obtained  results,  that  is,  they 
can  detect  the  presence  of  systematic  variation. 

Nonrandomness  might  be  summed  up  by  the  following  four  character¬ 
istics  of  observed  data: 

1.  discontinuities, 

2.  trends, 

3.  cyclic  or  periodic  movement, 

4.  extreme  values. 

Bear  in  mind  that  the  first  three  of  the  above  characteristics  are 
functions  of  the  order  in  which  the  observed  data,  or  the  observed  events 
from  which  we  get  the  data,  occur.  Except  for  extreme  values,  non¬ 
randomness  as  characterized  by  any  of  the  other  three  symptoms  usually 
can  be  made  to  disappear  by  a  rearrangement. 

1.  Runs  or  sign  test.  Consider  the  following  table  giving  three 
different  orders  of  the  same  number  of  heads  as  of  tails,  each  from  20 
tosses  of  a  coin. 


Table  XVII 


Toss  Number 

1 

2 

3 

4 

5  6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 

Order  1 

T 

T 

T 

T 

H  H 

T 

T 

H 

T 

H 

T 

T 

H 

T 

H 

T 

T 

H 

T 

Order  2 

H 

T 

T 

H 

T  T 

H 

T 

T 

H 

T 

T 

H 

T 

T 

H 

T 

T 

H 

T 

Order  3 

T 

T 

T 

T 

T  T 

T 

H 

H 

H 

H 

H 

H 

H 

T 

T 

T 

T 

T 

T 

The  first  series  of  heads  and  tails  did  occur  randomly.  The  second  and 
third  ones  are  rearrangements  of  the  first  one.  The  first  one  does  not 
appear  unusual  while  the  other  two  display  some  systematic  effect.  Now 
it  can' t  be  the  number  of  heads  since  this  is  the  same  in  each  series. 

It  is  the  order  which  signals  our  attention.  The  second  series  is  made 
up  entirely  of  sequences  HTT.  The  third  series  is  composed  of  a  long 
run  of  T' s  followed  by  a  long  run  of  H' s  followed  by  a  long  run  of  T'  s. 

To  get  at  this  in  a  way  more  scientifically  revealing  that  which  we 
noted  above,  let  us  examine  such  series  for  (1)  length  of  a  run  of  the  same 
event,  (2)  number  of  runs  of  different  lengths. 

The  first  series  has  one  run  of  length  4  (tails),  two  runs  of  length  2 
(tails),  two  runs  of  length  2  (heads),  five  runs  of  length  1  (heads),  three 
runs  of  length  1  (tails),  or  a  total  of  thirteen  runs.  Note  the  shorter 
length  runs  occurred  more  frequently  than  the  longer  length  runs.  On  the 
other  hand  the  second  series,  obviously  periodic,  has  more  rims,  fourteen, 
but  only  two  different  ones  and  these  are  of  lengths  1  (heads)  and  2  (foils). 

In  the  third  series  we  have  but  three  runs,  each  of  great  length.  It 
appears  that  increasing  the  number  of  runs  tends  to  reduce  the  length  of 
runs  and  vice  versa.  Hence  we  seek  a  probabilistic  description  of  both 
at  once. 

Actually  what  we  did  with  the  runs  is  abstractly  equivalent  to  what 
we  do  with  the  signs  of  the  differences  of  successive  data.  Hence  it  is  a 
sophisticated  form  of  the  use  of  signs.  For,  if  in  place  of  H' s  and  T' s 


we  have  twenty  readings,  then  by  taking  the  data  in  the  order  of  its 
occurrences  and  simply  using  the  signs  as  follows: 

sign  (x2  -  xi ),  sign  (x3  -  x2),  •  •  •,  sign  {xn  -  xn_i), 
we  consider  the  runs  in  (+)'  s  and  {-)’  s. 

In  general,  if  we  have  two  different  entities,  type  A  and  type  B, 
and  if  further  we  have  in  total  m  of  type  A  and  n  of  type  B,  then  it  can 
be  shown  that  in  a  series  of  length  m  +  n  if  U  =  number  of  runs  from  the 
m  of  type  A  plus  the  number  from  the  n  of  type  B, 


Pr{U 


Pr{U 


when  the  series  is  random.  Extreme  values  for  U,  small  ones  indicating 
a  few  long  rums,  large  ones  indicating  many  short  runs,  have  low  probability 
and  hence  indicate  possible  nonrandomness. 

a.  Illustration  1.  In  11  successive  quarters  demands  for  a 
certain  FSN  appeared  as  follows 

3,  7,  8,  10,  11,  13,  12,  11,  7,  6,  8. 

We  will  assume  they  appeared  randomly  from  a  stable  distribution.  By 
taking  successive  differences  we  find  the  sequence  of  signs  is 

+++++----+ 


consisting  of  three  runs.  Now  how  probable  due  to  chance  is  the  case  of 


I 


L 


l 


three  or  fewer  runs  in  a  sequence  of  5  plus  signs  and  5  minus  signs? 
Well,  the  probability  of  at  most  three  runs  is 

Pr{U  S3}  =  Pr{U  =  3}  +  Pr{U  =  2} 
since  we  cannot  have  less  than  two  different  runs.  Hence,  by  taking 
v  =  1  in  each  of  our  previous  formulae  along  with  m  =  n  s  S,  we  find 


Pr{U  £  3} 


_  2  4.  8 

"  252  252 


.0397 


So  this  supposed  random  sequence  of  demands  has  a  property  that 
occurs  due  to  chance  only  4%  of  the  time.  When  this  small  number  of 
runs  occurs,  it  is  very  likely  tiiat  some  nonrandom  behavior  is  present. 
We  must  remember,  in  order  to  use  this  method,  to  transform 
our  data  into  a  sequence  of  events  of  two  kinds.  Commonly  one  designates 
an  element  as  above  or  below  the  median,  thereby  creating  two  classes. 
This  lias  the  advantage  of  always  making  equal  the  number  of  elements 
of  each  kind,  i.  e. ,  we  can  always  take  m  =  n  in  our  previous  formulae. 

If  there  are  an  odd  number  of  elements,  we  drop  the  median.  Miss  Swed 
and  Dr.  Eisenhart  have  given  a  table  for  this  case,  that  is,  for  m  =  n. 

A  part  of  it  follows  in  Table  XVIII.  The  entries  give  the  number  of  runs 
for  a  particular  number  of  elements  2m  =  2n  such  that  the  probability  of 
this  number,- of  runs  or  less  than  {greater  than)  this  number  is  a  for  a  = 

.  05  and  for  a  -  .  01 . 
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Table  XVIH 


Critical  Values  of  U  »  Number  of  Runs 


Lower  C 

ritical 

Upper  Critical 

m  =  n 

a  =  0.  05 

o  =  0.01 

a  =  0.  05 

<y  =  0.  01 

•  5 

3 

2 

9 

10 

6 

3 

2 

11 

12 

7 

4 

3 

12 

13 

8 

5 

4 

13 

14 

9 

6 

4 

14 

16 

10 

6 

5 

16 

17 

20 

15 

13 

27 

29 

When  m  is  large,  theory  tells  us  that  we  can  use  the  fact  the  run 
distribution  is  nearly  normal  with  expected  value  m  +  1  and  standard 
deviation 

2.  Mean  square  successive  difference  test.  This  test  is  more 
powerful  but  not  as  quick  and  easy  to  apply  as  the  tests  in  the  former 
section.  The  former  tests  were  distrib\ition-free  whereas  this  one  is  not. 
This  test  depends  on  a  statistic  whose  distribution  was  discovered  by 
von  Neumann.  As  happened  so  often  with  our  classical  distribution 
functions,  he  assumed  a  normal  distribution  for  his  base  population  from 
which  the  random  samples  come. 

Here  we  must  compute  the  average  of  the  squares  of  the  (n  -  1) 
successive  differences  between  successive  elements  in  a  random  sample 


of  size  n.  Now  we  can  prove  the  expected  value  of  this  statistic,  namely  of 

,2  2{xi+i  -  xi)2 

6  °  — — — 

2 

is  2<rx,  regardless  of  the  base  population  distribution.  But  the  expected 
value  of  the  ordinary  sample  variance,  namely  of 

2  S(xj  -  x)2 

8  =  n-  1 

2 

is  or  .  Therefore  we  can  say  the  ratio 
x 


has  expected  value  2.  Dr.  von  Neumann  gave  us  the  distribution  function 
for  r| ,  and  in  1942  Dr.  Hart  gave  a  table  of  its  values.  We  repeat,  as 
has  happened  in  so  many  other  situations.  Dr.  von  Neumann  assumed 
the  sample  came  from  a  normal  distribution.  Table  XIX  is  an  abbreviated 
form  of  Dr.  Hart*  s  table. 

Before  we  illustrate  the  use  of  t]  in  detecting  nonrandomness  in  a 
sample,  we  might  get  a  feeling  for  its  sensitivity  to  nonrandomness  by 

1 

noticing  how  it  might  vary  from  the  value  of  2  in  certain  situations.  For 

2 

example,  when  data  has  an  upward  trend,  o  will  increase  much  less 
than  s2 .  So  r)  would  be  less  than  2.  On  the  other  hand,  if  the  data  rapidly 
goes  up  and  down,  62  will  increase  proportionally  greater  than  s2  .  Then 
t|  will  be  greater  than  2. 


u 
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Table  XIX 


Critical  Valuai  for  r| 


Sample  Size 
n 

Lower  Critical 

Upper  Critical 

a  =  0.  05 

a  =  0.  01 

a  =  0.  05 

a  =  0.  01 

4 

0.  78 

0.  63 

3.  22 

3.  37 

5 

0.  82 

0.  54 

3.  18 

3.46 

6 

0.  89 

0.  56 

3.11 

3.44 

7 

0.  94 

0.  61 

3.  06 

3.  39 

8 

0.  98 

0.  66 

3.  02 

3.  34 

9 

1.  02 

0.  71 

2.  98 

3.  29 

10 

1. 06 

0.  75 

2.  94 

3.  25 

20 

1.  30 

1.0‘t 

2.  70 

1 

2.96 

a.  Illustration  1.  This  illustration  was  first  given  by 
C.  A.  Bennett  of  General  Electric.  He  wished  to  show  that  the  runs 
testis  not  as  powerful  as  the  mean  square  successive  differences  test. 
He  gave  the  following  results  of  measuring  a  standard  sample  in  the 
order  of  their  analysis. 


First  let  us  compute  q  . 


Table  XX 


$ 


Sample  Nr 
i 

Result 

Xi 

Difference 

Xi+l  -  Xi 

Sample  Nr 
i 

Result 

Xi 

Difference 

Xi+1  -  Xi 

1 

83.50 

0.13 

11 

84.  40 

0.10 

2 

83.63 

0.53 

12 

84.50 

0.38 

3 

84. 16 

-0.  91 

13 

84.88 

-0.34 

4 

83.  25 

0. 11 

14 

84.54 

0. 16 

5 

83.36 

0.  90 

15 

84.70 

0.10 

6 

84.  26 

-0.  26 

16 

84.80 

-0.56 

7 

84.00 

0.61 

17 

84.  24 

-0. 13 

8 

84.61 

-0. 15 

18 

84.11 

0.41 

9 

84.  46 

-0.  26 

19 

84.52 

-0.38 

10 

84.  20 

0.  20 

20 

84.14 

Now 


Sxi  =  1684.26  Sx?  =  141840,7214. 


Hence 


2(xi  -  x)Z  =  4.  1341 . 


For  n  =  20  we  then  find 


s2  =  2(xi  -  x)2  =0.  2176. 


Next 


82  =  ■■■  -  S(xi+l  -  xi)2  =  3-4664  =  0.1824 
n  -  1  19 


/ 

/ 


m 


w 

t. 
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Therefore 


0.1824 
0.  2176 


0.  838.  '' 


Going  back  to  Table  XIX,  we  see  that  there  are  only  two  chances 
in  100  of  t)  falling  outside  the  interval  (1. 04,  2.  96)  when  n  =  20.  Hence 
our  computed  value  of  r|  is  significant  of  nonrandomness  being  present. 

An  examination  of  the  original  data  indicates  an  upward  trend. 

T 

Now  let  us  use  our  earlier  sign  or  runs  test  on  this  data.  There 
is  a  total  of  nine  runs  considering  the  runs  above  and  below  the  median 
of  84.  25.  For  n  =  20  the  expected  number  of  runs  is  eleven  and  though 
nine  is  smaller,  it  is  not  significantly  small  at  the  5%  level  which  is  six 
runs  as  can  be  seen  from  Table  XVIII  when  m  -  n  =  10.  Therefore  our  runs 
test  does  not  detect  the  nonrandomness  which  the  other  test  does  detect. 

There  are  other  tests  based  on  runs--the  length  of  the  longest  run, 
the  distribution  of  rims.  etc.  No  one  test  is  best  to  detect  nonrandomness 
in  all  cases.  For  example,  the  test  based  on  the  number  of  rims  may 
not  indicate  nonrandomness  while  the  test  based  on  the  longest  run  will. 


/;'■ 


156 


APPENDIX  A.  THEORY  AND  PRACTICE 


We  have  spoken  of  the  relation  between  probability  and  relative 
frequency.  It  has  been  said  that  the  gap  between  probability  theory  and 
practice  is  a  difficult  one  to  bridge.  One  bridge  over  the  gap  is  "the  law 
of  averages, 11  known  in  probability  theory  as  the  law  of  large  numbers. 

We  can  speak  of  it  here  since  it  refers  to  the  situation  in  which  there  is 
a  sequence  of  independent  events  with  fixed  probability  p.  If  a  sequence 
of  n  trials  is  made  and  the  number  of  successes  is  Sn,  the  proportion 
of  successes  in  n  trials  is  Sn/n.  We  ought  to  have  some  feeling  that  the 
average  Sn/n  approc.ches  the  fixed  probability  p  as  the  number  of  trials 
gets  larger  and  larger. 

To  this  end  let  us  consider  the  repeated  tossing  of  an  unbiased  coin 

and  keeping  tract  of  the  proportion  of  heads.  The  law  of  large  numbers 

tells  us  that  our  hopes  are  not  in  vain;  in  some  sense  this  proportion  should 

approach  1/2.  Now  we  shouldn' t  expect  this  proportion  to  suddenly  become 

exactly  1/2.  So  let1  s  take  some  small  percentage  of  deviation,  e,  and 

ask,  for  each  number  of  trials  n,  what  is  the  probability  that  the  proportion 

of  heads  differs  from  1/2  by  less  than  ?.  Specifically  let  e  =  10%.  Then 

we  have  to  find  for  each  n  the  probability  that  the  percentage  of  heads 

lies  between  40%  and  60%.  Now  we  need  a  probability  measure  of  the  set 

of  favorable  sequences,  i.  e. ,  those  which  will  not  deviate  from  50%  heads 

by  more  than  10%.  Our  Binomial  Law  provides  us  with  this.  It  says  the 

k  n-k 

probability  of  exactly  k  heads  in  a  sequence  of  n  tosses  is  nCk(»  (•  5) 


A  sequence  is  tolerable  if  the  number  of  heads  k  satisfies 
| k/n  -  .  50 1  S  .10 

If  n  =  5,  then  tolerable  tosses  have  2  or  3  heads  since  the  ratios  2/5  =  .40 
and  3/5  =  .60  are  within  the  tolerance  limits  -  they  are  the  limits.  On 
the  other  hand  a  sequence  with  0,  1,  4,  or  5  heads  is  "out.  "  The  proba¬ 
bility  of  a  "  tolerable"  sequence  is  the  sum  of  the  terms  in  the  Binomial 
Law  for  those  values  of  k  for  which  k/n  is  "within  limits.  "  In  the  case 
of  n  =  5  this  says  we  must  add  the  terms  of  the  expansion  (.  50  +  .  50)5 
for  which  k  is  2  or  3,  that  is, 

10(.5)z(.  5)3  +  10(.5)3(.5)2  =  .63 
is  the  probability  of  an  acceptable  sequence  of  5  tosses. 

If  you  were  to  go  on  with  this  by  taking  larger  values  for  n,  keeping 
€  =  10%,  you  would  obtain  among  others  the  entries 


n 

Number  of  Heads  Acceptable 

Probability 

5 

2  or  3 

.63 

10 

4,  5,  or  6 

.66 

15 

6,  7,  8,  or  9 

.70 

20 

8,  9,  10,  11,  or  12 

.  74 

100 

40,  ••*,  60 

.96 

200 

80,  •••,  120 

.996 

So  we  see  the  probability  of  acceptable  sequences  which  deviate 
from  50%  heads  by  not  more  than  10%  steadily  increases  as  we  toss  the 
coin  more  and  mere.  However,  note  that  no  matter  how  large  n  may 
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